scholarly journals Robust Design for Coalescent Model Inference

2018 ◽  
Author(s):  
Kris V Parag ◽  
Oliver G Pybus

Abstract—The coalescent process describes how changes in the size of a population influence the genealogical patterns of sequences sampled from that population. The estimation of population size changes from genealogies that are reconstructed from these sequence samples, is an important problem in many biological fields. Often, population size is characterised by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature devoted to coalescent inference methodology, there is surprisingly little work on experimental design. The research that does exist is largely simulation based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling for the structured coalescent model, and time discretisation for sequentially Markovian coalescent models. In all cases we prove that (i) working in the logarithm of the parameters to be inferred (e.g. population size), and (ii) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. ‘Robust’ means that the total and maximum uncertainty of our estimates are minimised, and are also insensitive to their unknown (true) parameter values. Given its persistence among models, this formally derived two-point theorem may form the basis of an experimental design paradigm for coalescent inference.

2019 ◽  
Vol 68 (5) ◽  
pp. 730-743 ◽  
Author(s):  
Kris V Parag ◽  
Oliver G Pybus

Abstract The coalescent process describes how changes in the size or structure of a population influence the genealogical patterns of sequences sampled from that population. The estimation of (effective) population size changes from genealogies that are reconstructed from these sampled sequences is an important problem in many biological fields. Often, population size is characterized by a piecewise-constant function, with each piece serving as a population size parameter to be estimated. Estimation quality depends on both the statistical coalescent inference method employed, and on the experimental protocol, which controls variables such as the sampling of sequences through time and space, or the transformation of model parameters. While there is an extensive literature on coalescent inference methodology, there is comparatively little work on experimental design. The research that does exist is largely simulation-based, precluding the development of provable or general design theorems. We examine three key design problems: temporal sampling of sequences under the skyline demographic coalescent model, spatio-temporal sampling under the structured coalescent model, and time discretization for sequentially Markovian coalescent models. In all cases, we prove that 1) working in the logarithm of the parameters to be inferred (e.g., population size) and 2) distributing informative coalescent events uniformly among these log-parameters, is uniquely robust. “Robust” means that the total and maximum uncertainty of our parameter estimates are minimized, and made insensitive to their unknown (true) values. This robust design theorem provides rigorous justification for several existing coalescent experimental design decisions and leads to usable guidelines for future empirical or simulation-based investigations. Given its persistence among models, this theorem may form the basis of an experimental design paradigm for coalescent inference.


Author(s):  
Madoka Muroishi ◽  
Akira Yakita

AbstractUsing a small, open, two-region economy model populated by two-period-lived overlapping generations, we analyze long-term agglomeration economy and congestion diseconomy effects of young worker concentration on migration and the overall fertility rate. When the migration-stability condition is satisfied, the distribution of young workers between regions is obtainable in each period for a predetermined population size. Results show that migration stability does not guarantee dynamic stability of the economy. The stationary population size stability depends on the model parameters and the initial population size. On a stable trajectory converging to the stationary equilibrium, the overall fertility rate might change non-monotonically with the population size of the economy because of interregional migration. In each period, interregional migration mitigates regional population changes caused by fertility differences on the stable path. Results show that the inter-regional migration-stability condition does not guarantee stability of the population dynamics of the economy.


2020 ◽  
Vol 53 (3) ◽  
pp. 800-810
Author(s):  
Frank Heinrich ◽  
Paul A. Kienzle ◽  
David P. Hoogerheide ◽  
Mathias Lösche

A framework is applied to quantify information gain from neutron or X-ray reflectometry experiments [Treece, Kienzle, Hoogerheide, Majkrzak, Lösche & Heinrich (2019). J. Appl. Cryst. 52, 47–59], in an in-depth investigation into the design of scattering contrast in biological and soft-matter surface architectures. To focus the experimental design on regions of interest, the marginalization of the information gain with respect to a subset of model parameters describing the structure is implemented. Surface architectures of increasing complexity from a simple model system to a protein–lipid membrane complex are simulated. The information gain from virtual surface scattering experiments is quantified as a function of the scattering length density of molecular components of the architecture and the surrounding aqueous bulk solvent. It is concluded that the information gain is mostly determined by the local scattering contrast of a feature of interest with its immediate molecular environment, and experimental design should primarily focus on this region. The overall signal-to-noise ratio of the measured reflectivity modulates the information gain globally and is a second factor to be taken into consideration.


1965 ◽  
Vol 20 (1) ◽  
pp. 121-122
Author(s):  
Edward A. Bilodeau

A tiny experiment was reported by Dyal (1964) with results apparently contradicting the bulk of an extensive literature he failed to cite. The literature contains far better experimental designs, resources, and discussion of the issues.


2000 ◽  
Vol 4 (3) ◽  
pp. 483-498 ◽  
Author(s):  
M. Franchini ◽  
A. M. Hashemi ◽  
P. E. O’Connell

Abstract. The sensitivity analysis described in Hashemi et al. (2000) is based on one-at-a-time perturbations to the model parameters. This type of analysis cannot highlight the presence of parameter interactions which might indeed affect the characteristics of the flood frequency curve (ffc) even more than the individual parameters. For this reason, the effects of the parameters of the rainfall, rainfall runoff models and of the potential evapotranspiration demand on the ffc are investigated here through an analysis of the results obtained from a factorial experimental design, where all the parameters are allowed to vary simultaneously. This latter, more complex, analysis confirms the results obtained in Hashemi et al. (2000) thus making the conclusions drawn there of wider validity and not related strictly to the reference set selected. However, it is shown that two-factor interactions are present not only between different pairs of parameters of an individual model, but also between pairs of parameters of different models, such as rainfall and rainfall-runoff models, thus demonstrating the complex interaction between climate and basin characteristics affecting the ffc and in particular its curvature. Furthermore, the wider range of climatic regime behaviour produced within the factorial experimental design shows that the probability distribution of soil moisture content at the storm arrival time is no longer sufficient to explain the link between the perturbations to the parameters and their effects on the ffc, as was suggested in Hashemi et al. (2000). Other factors have to be considered, such as the probability distribution of the soil moisture capacity, and the rainfall regime, expressed through the annual maximum rainfalls over different durations. Keywords: Monte Carlo simulation; factorial experimental design; analysis of variance (ANOVA)


2016 ◽  
Vol 2016 ◽  
pp. 1-14
Author(s):  
Lin-Ping Song ◽  
Leonard R. Pasion ◽  
Nicolas Lhomme ◽  
Douglas W. Oldenburg

This work, under the optimal experimental design framework, investigates the sensor placement problem that aims to guide electromagnetic induction (EMI) sensing of multiple objects. We use the linearized model covariance matrix as a measure of estimation error to present a sequential experimental design (SED) technique. The technique recursively minimizes data misfit to update model parameters and maximizes an information gain function for a future survey relative to previous surveys. The fundamental process of the SED seeks to increase weighted sensitivities to targets when placing sensors. The synthetic and field experiments demonstrate that SED can be used to guide the sensing process for an effective interrogation. It also can serve as a theoretic basis to improve empirical survey operation. We further study the sensitivity of the SED to the number of objects within the sensing range. The tests suggest that an appropriately overrepresented model about expected anomalies might be a feasible choice.


2016 ◽  
Vol 73 (9) ◽  
pp. 2178-2180 ◽  
Author(s):  
W. Stewart Grant ◽  
Einar Árnason ◽  
Bjarki Eldon

Abstract The analyses of often large amounts of field and laboratory data depend on computer programs to generate descriptive statistics and to test hypotheses. The algorithms in these programs are often complex and can be understood only with advanced training in mathematics and programming, topics that are beyond the capabilities of most fisheries biologists and empirical population geneticists. The backward looking Kingman coalescent model, based on the classic forward-looking Wright–Fisher model of genetic change, is used in many genetics software programs to generate null distributions against which to test hypotheses. An article in this issue by Niwa et al. shows that the assumption of bifurcations at nodes in the Kingman coalescent model is inappropriate for highly fecund Japanese sardines, which have type III life histories. Species with this life history pattern are better modelled with multiple mergers at the nodes of a coalescent gene genealogy. However, only a few software programs allow analysis with multiple-merger coalescent models. This parameter misspecification produces demographic reconstructions that reach too far into the past and greatly overestimates genetically effective population sizes (the number of individuals actually contributing to the next generation). The results of Niwa et al. underline the need to understand the assumptions and model parameters in the software programs used to analyse DNA sequences.


1994 ◽  
Vol 116 (3) ◽  
pp. 529-536 ◽  
Author(s):  
M. A. Hopkins ◽  
H. F. VanLandingham

This paper presents a new nonlinear method of simultaneous parameter and state estimation called pseudo-linear identification (PLID), for stochastic linear time-invariant discrete-time systems. No assumptions are required about pole or zero locations; nor about relative degree, except that the system transfer function must be strictly proper. Under standard gaussian assumptions, for completely controllable and observable systems, it is proved that PLID is the minimum mean-square-error estimator of the states and model parameters, conditioned on the input and output measurements. It is also proved, given persistent excitation, that the parameter estimates converge a.e. to the true parameter values. All results have been extended to the multiple-input, multiple-output case, but the single-input, single-output case is presented here to simplify notation.


2015 ◽  
Author(s):  
Julia A Palacios ◽  
John Wakeley ◽  
Sohini Ramachandran

Sophisticated inferential tools coupled with the coalescent model have recently emerged for estimating past population sizes from genomic data. Accurate methods are available for data from a single locus or from independent loci. Recent methods that model recombination require small sample sizes, make constraining assumptions about population size changes, and do not report measures of uncertainty for estimates. Here, we develop a Gaussian process-based Bayesian nonparametric method coupled with a sequentially Markov coalescent model which allows accurate inference of population sizes over time from a set of genealogies. In contrast to current methods, our approach considers a broad class of recombination events, including those that do not change local genealogies. We show that our method outperforms recent likelihood-based methods that rely on discretization of the parameter space. We illustrate the application of our method to multiple demographic histories, including population bottlenecks and exponential growth. In simulation, our Bayesian approach produces point estimates four times more accurate than maximum likelihood estimation (based on the sum of absolute differences between the truth and the estimated values). Further, our method's credible intervals for population size as a function of time cover 90 percent of true values across multiple demographic scenarios, enabling formal hypothesis testing about population size differences over time. Using genealogies estimated with ARGweaver, we apply our method to European and Yoruban samples from the 1000 Genomes Project and confirm key known aspects of population size history over the past 150,000 years.


2018 ◽  
Vol 5 (8) ◽  
pp. 180384 ◽  
Author(s):  
Andrew Parker ◽  
Matthew J. Simpson ◽  
Ruth E. Baker

To better understand development, repair and disease progression, it is useful to quantify the behaviour of proliferative and motile cell populations as they grow and expand to fill their local environment. Inferring parameters associated with mechanistic models of cell colony growth using quantitative data collected from carefully designed experiments provides a natural means to elucidate the relative contributions of various processes to the growth of the colony. In this work, we explore how experimental design impacts our ability to infer parameters for simple models of the growth of proliferative and motile cell populations. We adopt a Bayesian approach, which allows us to characterize the uncertainty associated with estimates of the model parameters. Our results suggest that experimental designs that incorporate initial spatial heterogeneities in cell positions facilitate parameter inference without the requirement of cell tracking, while designs that involve uniform initial placement of cells require cell tracking for accurate parameter inference. As cell tracking is an experimental bottleneck in many studies of this type, our recommendations for experimental design provide for significant potential time and cost savings in the analysis of cell colony growth.


Sign in / Sign up

Export Citation Format

Share Document