scholarly journals Dirichlet-multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

2019 ◽  
Author(s):  
Joshua G. Harrison ◽  
W. John Calder ◽  
Vivaswat Shastry ◽  
C. Alex Buerkle

AbstractMolecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue). The sampling process that generates these data can be modeled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups. This type of Dirichlet-multinomial modelling (DMM) has been described previously, but its benefits and limitations are largely untested. With simulated data, we quantified the ability of DMM to detect differences in proportions between treatment and control groups, and compared the efficacy of three computational methods to implement DMM—Hamiltonian Monte Carlo (HMC), variational inference (VI), and Gibbs Markov chain Monte Carlo. We report that DMM was better able to detect shifts in relative abundances than analogous analytical tools, while identifying an acceptably low number of false positives. Among methods for implementing DMM, HMC provided the most accurate estimates of relative abundances, and VI was the most computationally efficient. The sensitivity of DMM was exemplified through analysis of previously published data describing lung microbiomes. We report that DMM identified several potentially pathogenic, bacterial taxa as more abundant in the lungs of children who aspirated foreign material during swallowing; these differences went undetected with different statistical approaches. Our results suggest that DMM has strong potential as a statistical method to guide inference in molecular ecology.

1989 ◽  
Vol 46 (9) ◽  
pp. 1499-1509
Author(s):  
James W. Haefner ◽  
Linda C. Abbott

A simple model was developed to extrapolate laboratory dose–response data to randomly varying conditions. We fit published data of dose–response experiments for the effects of low pH stress on survival rates of stream macro-invertebrates to a modification of the Weibull distribution. Using the resulting parameter estimates and Monte Carlo simulation, we compared the values obtained in constant laboratory conditions with the expected survival rates obtained in fluctuating environments. For each of three species, we performed 108 Monte Carlo experiments in a full factorial design that varied the mean pH, the standard deviation of pH fluctuations, the distribution from which pH values were drawn, the distributions of runs of constant pH, and the presence of episodic events. Fluctuating environments decreased the survival rates of resistant species, but increased survivorship of sensitive species. No one exposure duration under laboratory conditions could consistently be extrapolated to the suite of variable environments we examined. Probit analyses performed on the observed and simulated data indicated that LC-50s of different observed exposure durations were similar to each other and to the LC-50s of the simulated data assuming an exposure of 24 h. Based on these results, we recommend that toxicity studies incorporate temporal variability directly by using varying dose levels in laboratory tests.


2021 ◽  
Vol 24 (1) ◽  
pp. 112-136
Author(s):  
Elvira Di Nardo ◽  
Federico Polito ◽  
Enrico Scalas

Abstract This paper is devoted to a fractional generalization of the Dirichlet distribution. The form of the multivariate distribution is derived assuming that the n partitions of the interval [0, Wn ] are independent and identically distributed random variables following the generalized Mittag-Leffler distribution. The expected value and variance of the one-dimensional marginal are derived as well as the form of its probability density function. A related generalized Dirichlet distribution is studied that provides a reasonable approximation for some values of the parameters. The relation between this distribution and other generalizations of the Dirichlet distribution is discussed. Monte Carlo simulations of the one-dimensional marginals for both distributions are presented.


2004 ◽  
Vol 2004 (8) ◽  
pp. 421-429 ◽  
Author(s):  
Souad Assoudou ◽  
Belkheir Essebbar

This note is concerned with Bayesian estimation of the transition probabilities of a binary Markov chain observed from heterogeneous individuals. The model is founded on the Jeffreys' prior which allows for transition probabilities to be correlated. The Bayesian estimator is approximated by means of Monte Carlo Markov chain (MCMC) techniques. The performance of the Bayesian estimates is illustrated by analyzing a small simulated data set.


2021 ◽  
Vol 60 (4) ◽  
pp. 513-526
Author(s):  
Bhupendra A. Raut ◽  
Robert Jackson ◽  
Mark Picel ◽  
Scott M. Collis ◽  
Martin Bergemann ◽  
...  

AbstractA robust and computationally efficient object tracking algorithm is developed by incorporating various tracking techniques. Physical properties of the objects, such as brightness temperature or reflectivity, are not considered. Therefore, the algorithm is adaptable for tracking convection-like features in simulated data and remotely sensed two-dimensional images. In this algorithm, a first guess of the motion, estimated using the Fourier phase shift, is used to predict the candidates for matching. A disparity score is computed for each target–candidate pair. The disparity also incorporates overlapping criteria in the case of large objects. Then the Hungarian method is applied to identify the best pairs by minimizing the global disparity. The high-disparity pairs are unmatched, and their target and candidate are declared expired and newly initiated objects, respectively. They are tested for merger and split on the basis of their size and overlap with the other objects. The sensitivity of track duration is shown for different disparity and size thresholds. The paper highlights the algorithm’s ability to study convective life cycles using radar and simulated data over Darwin, Australia. The algorithm skillfully tracks individual convective cells (a few pixels in size) and large convective systems. The duration of tracks and cell size are found to be lognormally distributed over Darwin. The evolution of size and precipitation types of isolated convective cells is presented in the Lagrangian perspective. This algorithm is part of a vision for a modular platform [viz., TINT is not TITAN (TINT) and Tracking and Object-Based Analysis of Clouds (tobac)] that will evolve into a sustainable choice to analyze atmospheric features.


2017 ◽  
Vol 15 (2) ◽  
pp. e05R01 ◽  
Author(s):  
Francisco J. Diéguez ◽  
Manuel Cerviño ◽  
Eduardo Yus

Bovine viral diarrhea virus (BVDV), a member of the genus Pestivirus of the family Flaviviridae, causes significant losses in cattle farming worldwide because of reduced milk production, increased mortality of young animals and reproductive, respiratory and intestinal problems. The virus is characterized by an important genetic, and consequently antigenic and pathogenic diversity. Knowing the variability of viral strains present in a population provides valuable information, particularly relevant for control programs development, vaccination recommendations and even identification of likely infection sources. Such information is therefore important at both local and regional levels. This review focuses on the genetic diversity of BVDV isolates infecting cattle in Spain over the last years. According to the published data, the most prevalent BVDV group in Spain was 1b, and to a lesser extent 1d, 1e and 1f. Besides, BVDV-2 has also been found in Spain with several ratified isolates. The studies carried out in Spain also showed increased genetic heterogeneity of BVDV strains, possibly due to a more intensive use of analytical tools available, presenting studies with increasingly greater sample sizes.


Entropy ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. 998
Author(s):  
Luis Javier Herrera ◽  
Carlos José Todero Peixoto ◽  
Oresti Baños ◽  
Juan Miguel Carceller ◽  
Francisco Carrillo ◽  
...  

The study of cosmic rays remains as one of the most challenging research fields in Physics. From the many questions still open in this area, knowledge of the type of primary for each event remains as one of the most important issues. All of the cosmic rays observatories have been trying to solve this question for at least six decades, but have not yet succeeded. The main obstacle is the impossibility of directly detecting high energy primary events, being necessary to use Monte Carlo models and simulations to characterize generated particles cascades. This work presents the results attained using a simulated dataset that was provided by the Monte Carlo code CORSIKA, which is a simulator of high energy particles interactions with the atmosphere, resulting in a cascade of secondary particles extending for a few kilometers (in diameter) at ground level. Using this simulated data, a set of machine learning classifiers have been designed and trained, and their computational cost and effectiveness compared, when classifying the type of primary under ideal measuring conditions. Additionally, a feature selection algorithm has allowed for identifying the relevance of the considered features. The results confirm the importance of the electromagnetic-muonic component separation from signal data measured for the problem. The obtained results are quite encouraging and open new work lines for future more restrictive simulations.


2017 ◽  
Vol 23 (3) ◽  
pp. 618-633 ◽  
Author(s):  
Nicholas W. M. Ritchie

AbstractSecondary fluorescence, the final term in the familiar matrix correction triumvirate Z·A·F, is the most challenging for Monte Carlo models to simulate. In fact, only two implementations of Monte Carlo models commonly used to simulate electron probe X-ray spectra can calculate secondary fluorescence—PENEPMA and NIST DTSA-IIa (DTSA-II is discussed herein). These two models share many physical models but there are some important differences in the way each implements X-ray emission including secondary fluorescence. PENEPMA is based on PENELOPE, a general purpose software package for simulation of both relativistic and subrelativistic electron/positron interactions with matter. On the other hand, NIST DTSA-II was designed exclusively for simulation of X-ray spectra generated by subrelativistic electrons. NIST DTSA-II uses variance reduction techniques unsuited to general purpose code. These optimizations help NIST DTSA-II to be orders of magnitude more computationally efficient while retaining detector position sensitivity. Simulations execute in minutes rather than hours and can model differences that result from detector position. Both PENEPMA and NIST DTSA-II are capable of handling complex sample geometries and we will demonstrate that both are of similar accuracy when modeling experimental secondary fluorescence data from the literature.


Sensors ◽  
2020 ◽  
Vol 20 (18) ◽  
pp. 5226
Author(s):  
Subhrasankha Dey ◽  
Stephan Winter ◽  
Martin Tomko

All established models in transportation engineering that estimate the numbers of trips between origins and destinations from vehicle counts use some form of a priori knowledge of the traffic. This paper, in contrast, presents a new origin–destination flow estimation model that uses only vehicle counts observed by traffic count sensors; it requires neither historical origin–destination trip data for the estimation nor any assumed distribution of flow. This approach utilises a method of statistical origin–destination flow estimation in computer networks, and transfers the principles to the domain of road traffic by applying transport-geographic constraints in order to keep traffic embedded in physical space. Being purely stochastic, our model overcomes the conceptual weaknesses of the existing models, and additionally estimates travel times of individual vehicles. The model has been implemented in a real-world road network in the city of Melbourne, Australia. The model was validated with simulated data and real-world observations from two different data sources. The validation results show that all the origin–destination flows were estimated with a good accuracy score using link count data only. Additionally, the estimated travel times by the model were close approximations to the observed travel times in the real world.


Sign in / Sign up

Export Citation Format

Share Document