Dirichlet-multinomial modelling outperforms alternatives for analysis of microbiome and other ecological count data

Mapping Intimacies ◽

10.1101/711317 ◽

2019 ◽

Cited By ~ 1

Author(s):

Joshua G. Harrison ◽

W. John Calder ◽

Vivaswat Shastry ◽

C. Alex Buerkle

Keyword(s):

Monte Carlo ◽

Count Data ◽

Dirichlet Distribution ◽

Molecular Ecology ◽

Multinomial Distribution ◽

Simulated Data ◽

Published Data ◽

Computationally Efficient ◽

Relative Abundances ◽

Analytical Tools

AbstractMolecular ecology regularly requires the analysis of count data that reflect the relative abundance of features of a composition (e.g., taxa in a community, gene transcripts in a tissue). The sampling process that generates these data can be modeled using the multinomial distribution. Replicate multinomial samples inform the relative abundances of features in an underlying Dirichlet distribution. These distributions together form a hierarchical model for relative abundances among replicates and sampling groups. This type of Dirichlet-multinomial modelling (DMM) has been described previously, but its benefits and limitations are largely untested. With simulated data, we quantified the ability of DMM to detect differences in proportions between treatment and control groups, and compared the efficacy of three computational methods to implement DMM—Hamiltonian Monte Carlo (HMC), variational inference (VI), and Gibbs Markov chain Monte Carlo. We report that DMM was better able to detect shifts in relative abundances than analogous analytical tools, while identifying an acceptably low number of false positives. Among methods for implementing DMM, HMC provided the most accurate estimates of relative abundances, and VI was the most computationally efficient. The sensitivity of DMM was exemplified through analysis of previously published data describing lung microbiomes. We report that DMM identified several potentially pathogenic, bacterial taxa as more abundant in the lungs of children who aspirated foreign material during swallowing; these differences went undetected with different statistical approaches. Our results suggest that DMM has strong potential as a statistical method to guide inference in molecular ecology.

Download Full-text

Extrapolation of Laboratory pH Dose–Response Data to Fluctuating Environments: Comparisons with a Null Model

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f89-192 ◽

1989 ◽

Vol 46 (9) ◽

pp. 1499-1509

Author(s):

James W. Haefner ◽

Linda C. Abbott

Keyword(s):

Monte Carlo ◽

Dose Response ◽

Survival Rates ◽

Null Model ◽

Simulated Data ◽

Parameter Estimates ◽

Published Data ◽

Response Data ◽

Laboratory Conditions ◽

Fluctuating Environments

A simple model was developed to extrapolate laboratory dose–response data to randomly varying conditions. We fit published data of dose–response experiments for the effects of low pH stress on survival rates of stream macro-invertebrates to a modification of the Weibull distribution. Using the resulting parameter estimates and Monte Carlo simulation, we compared the values obtained in constant laboratory conditions with the expected survival rates obtained in fluctuating environments. For each of three species, we performed 108 Monte Carlo experiments in a full factorial design that varied the mean pH, the standard deviation of pH fluctuations, the distribution from which pH values were drawn, the distributions of runs of constant pH, and the presence of episodic events. Fluctuating environments decreased the survival rates of resistant species, but increased survivorship of sensitive species. No one exposure duration under laboratory conditions could consistently be extrapolated to the suite of variable environments we examined. Probit analyses performed on the observed and simulated data indicated that LC-50s of different observed exposure durations were similar to each other and to the LC-50s of the simulated data assuming an exposure of 24 h. Based on these results, we recommend that toxicity studies incorporate temporal variability directly by using varying dose levels in laboratory tests.

Download Full-text

A fractional generalization of the dirichlet distribution and related distributions

Fractional Calculus and Applied Analysis ◽

10.1515/fca-2021-0006 ◽

2021 ◽

Vol 24 (1) ◽

pp. 112-136

Author(s):

Elvira Di Nardo ◽

Federico Polito ◽

Enrico Scalas

Keyword(s):

Monte Carlo ◽

Probability Density Function ◽

Monte Carlo Simulations ◽

Probability Density ◽

Dirichlet Distribution ◽

One Dimensional ◽

Reasonable Approximation ◽

Generalized Dirichlet Distribution ◽

The One ◽

Generalized Dirichlet

Abstract This paper is devoted to a fractional generalization of the Dirichlet distribution. The form of the multivariate distribution is derived assuming that the n partitions of the interval [0, Wn ] are independent and identically distributed random variables following the generalized Mittag-Leffler distribution. The expected value and variance of the one-dimensional marginal are derived as well as the form of its probability density function. A related generalized Dirichlet distribution is studied that provides a reasonable approximation for some values of the parameters. The relation between this distribution and other generalizations of the Dirichlet distribution is discussed. Monte Carlo simulations of the one-dimensional marginals for both distributions are presented.

Download Full-text

A Bayesian model for binary Markov chains

International Journal of Mathematics and Mathematical Sciences ◽

10.1155/s0161171204202319 ◽

2004 ◽

Vol 2004 (8) ◽

pp. 421-429 ◽

Cited By ~ 2

Author(s):

Souad Assoudou ◽

Belkheir Essebbar

Keyword(s):

Monte Carlo ◽

Markov Chain ◽

Markov Chains ◽

Bayesian Estimation ◽

Bayesian Model ◽

Transition Probabilities ◽

Simulated Data ◽

Bayesian Estimator ◽

Jeffreys Prior ◽

Data Set

This note is concerned with Bayesian estimation of the transition probabilities of a binary Markov chain observed from heterogeneous individuals. The model is founded on the Jeffreys' prior which allows for transition probabilities to be correlated. The Bayesian estimator is approximated by means of Monte Carlo Markov chain (MCMC) techniques. The performance of the Bayesian estimates is illustrated by analyzing a small simulated data set.

Download Full-text

An Adaptive Tracking Algorithm for Convection in Simulated and Remote Sensing Data

Journal of Applied Meteorology and Climatology ◽

10.1175/jamc-d-20-0119.1 ◽

2021 ◽

Vol 60 (4) ◽

pp. 513-526

Author(s):

Bhupendra A. Raut ◽

Robert Jackson ◽

Mark Picel ◽

Scott M. Collis ◽

Martin Bergemann ◽

...

Keyword(s):

Simulated Data ◽

Remote Sensing Data ◽

Life Cycles ◽

Tracking Algorithm ◽

Two Dimensional ◽

Computationally Efficient ◽

Convective Systems ◽

Object Based ◽

Modular Platform ◽

Convective Cells

AbstractA robust and computationally efficient object tracking algorithm is developed by incorporating various tracking techniques. Physical properties of the objects, such as brightness temperature or reflectivity, are not considered. Therefore, the algorithm is adaptable for tracking convection-like features in simulated data and remotely sensed two-dimensional images. In this algorithm, a first guess of the motion, estimated using the Fourier phase shift, is used to predict the candidates for matching. A disparity score is computed for each target–candidate pair. The disparity also incorporates overlapping criteria in the case of large objects. Then the Hungarian method is applied to identify the best pairs by minimizing the global disparity. The high-disparity pairs are unmatched, and their target and candidate are declared expired and newly initiated objects, respectively. They are tested for merger and split on the basis of their size and overlap with the other objects. The sensitivity of track duration is shown for different disparity and size thresholds. The paper highlights the algorithm’s ability to study convective life cycles using radar and simulated data over Darwin, Australia. The algorithm skillfully tracks individual convective cells (a few pixels in size) and large convective systems. The duration of tracks and cell size are found to be lognormally distributed over Darwin. The evolution of size and precipitation types of isolated convective cells is presented in the Lagrangian perspective. This algorithm is part of a vision for a modular platform [viz., TINT is not TITAN (TINT) and Tracking and Object-Based Analysis of Clouds (tobac)] that will evolve into a sustainable choice to analyze atmospheric features.

Download Full-text

Exact Fisher Information of Generalized Dirichlet Multinomial Distribution for Count Data Modeling

Information Sciences ◽

10.1016/j.ins.2021.11.083 ◽

2021 ◽

Author(s):

Fatma Najar ◽

Nizar Bouguila

Keyword(s):

Fisher Information ◽

Count Data ◽

Multinomial Distribution ◽

Data Modeling ◽

Generalized Dirichlet

Download Full-text

A computationally efficient moment-preserving Monte Carlo electron transport method with implementation in Geant4

Nuclear Instruments and Methods in Physics Research Section B Beam Interactions with Materials and Atoms ◽

10.1016/j.nimb.2015.07.009 ◽

2015 ◽

Vol 359 ◽

pp. 20-35 ◽

Cited By ~ 2

Author(s):

D.A. Dixon ◽

A.K. Prinja ◽

B.C. Franke

Keyword(s):

Monte Carlo ◽

Electron Transport ◽

Computationally Efficient ◽

Transport Method

Download Full-text

Bovine viral diarrhea virus (BVDV) genetic diversity in Spain: A review

Spanish Journal of Agricultural Research ◽

10.5424/sjar/2017152-10619 ◽

2017 ◽

Vol 15 (2) ◽

pp. e05R01 ◽

Cited By ~ 4

Author(s):

Francisco J. Diéguez ◽

Manuel Cerviño ◽

Eduardo Yus

Keyword(s):

Genetic Diversity ◽

Bovine Viral Diarrhea Virus ◽

Bovine Viral Diarrhea ◽

Published Data ◽

Diarrhea Virus ◽

Viral Diarrhea ◽

Control Programs ◽

Vaccination Recommendations ◽

Analytical Tools ◽

Viral Diarrhea Virus

Bovine viral diarrhea virus (BVDV), a member of the genus Pestivirus of the family Flaviviridae, causes significant losses in cattle farming worldwide because of reduced milk production, increased mortality of young animals and reproductive, respiratory and intestinal problems. The virus is characterized by an important genetic, and consequently antigenic and pathogenic diversity. Knowing the variability of viral strains present in a population provides valuable information, particularly relevant for control programs development, vaccination recommendations and even identification of likely infection sources. Such information is therefore important at both local and regional levels. This review focuses on the genetic diversity of BVDV isolates infecting cattle in Spain over the last years. According to the published data, the most prevalent BVDV group in Spain was 1b, and to a lesser extent 1d, 1e and 1f. Besides, BVDV-2 has also been found in Spain with several ratified isolates. The studies carried out in Spain also showed increased genetic heterogeneity of BVDV strains, possibly due to a more intensive use of analytical tools available, presenting studies with increasingly greater sample sizes.

Download Full-text

Composition Classification of Ultra-High Energy Cosmic Rays

Entropy ◽

10.3390/e22090998 ◽

2020 ◽

Vol 22 (9) ◽

pp. 998

Author(s):

Luis Javier Herrera ◽

Carlos José Todero Peixoto ◽

Oresti Baños ◽

Juan Miguel Carceller ◽

Francisco Carrillo ◽

...

Keyword(s):

Monte Carlo ◽

Cosmic Rays ◽

Computational Cost ◽

Simulated Data ◽

Ground Level ◽

High Energy ◽

Monte Carlo Code ◽

High Energy Cosmic Rays ◽

Research Fields ◽

The Many

The study of cosmic rays remains as one of the most challenging research fields in Physics. From the many questions still open in this area, knowledge of the type of primary for each event remains as one of the most important issues. All of the cosmic rays observatories have been trying to solve this question for at least six decades, but have not yet succeeded. The main obstacle is the impossibility of directly detecting high energy primary events, being necessary to use Monte Carlo models and simulations to characterize generated particles cascades. This work presents the results attained using a simulated dataset that was provided by the Monte Carlo code CORSIKA, which is a simulator of high energy particles interactions with the atmosphere, resulting in a cascade of secondary particles extending for a few kilometers (in diameter) at ground level. Using this simulated data, a set of machine learning classifiers have been designed and trained, and their computational cost and effectiveness compared, when classifying the type of primary under ideal measuring conditions. Additionally, a feature selection algorithm has allowed for identifying the relevance of the considered features. The results confirm the importance of the electromagnetic-muonic component separation from signal data measured for the problem. The obtained results are quite encouraging and open new work lines for future more restrictive simulations.

Download Full-text

Efficient Simulation of Secondary Fluorescence Via NIST DTSA-II Monte Carlo

Microscopy and Microanalysis ◽

10.1017/s1431927617000307 ◽

2017 ◽

Vol 23 (3) ◽

pp. 618-633 ◽

Cited By ~ 15

Author(s):

Nicholas W. M. Ritchie

Keyword(s):

Monte Carlo ◽

Variance Reduction ◽

General Purpose ◽

Physical Models ◽

Computationally Efficient ◽

X Ray ◽

Detector Position ◽

Reduction Techniques ◽

Similar Accuracy ◽

Secondary Fluorescence

AbstractSecondary fluorescence, the final term in the familiar matrix correction triumvirate Z·A·F, is the most challenging for Monte Carlo models to simulate. In fact, only two implementations of Monte Carlo models commonly used to simulate electron probe X-ray spectra can calculate secondary fluorescence—PENEPMA and NIST DTSA-IIa (DTSA-II is discussed herein). These two models share many physical models but there are some important differences in the way each implements X-ray emission including secondary fluorescence. PENEPMA is based on PENELOPE, a general purpose software package for simulation of both relativistic and subrelativistic electron/positron interactions with matter. On the other hand, NIST DTSA-II was designed exclusively for simulation of X-ray spectra generated by subrelativistic electrons. NIST DTSA-II uses variance reduction techniques unsuited to general purpose code. These optimizations help NIST DTSA-II to be orders of magnitude more computationally efficient while retaining detector position sensitivity. Simulations execute in minutes rather than hours and can model differences that result from detector position. Both PENEPMA and NIST DTSA-II are capable of handling complex sample geometries and we will demonstrate that both are of similar accuracy when modeling experimental secondary fluorescence data from the literature.

Download Full-text

Origin–Destination Flow Estimation from Link Count Data Only

Sensors ◽

10.3390/s20185226 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5226

Author(s):

Subhrasankha Dey ◽

Stephan Winter ◽

Martin Tomko

Keyword(s):

Real World ◽

Count Data ◽

Road Traffic ◽

Physical Space ◽

Simulated Data ◽

Accuracy Score ◽

Travel Times ◽

Estimation Model ◽

Flow Estimation ◽

Link Count

All established models in transportation engineering that estimate the numbers of trips between origins and destinations from vehicle counts use some form of a priori knowledge of the traffic. This paper, in contrast, presents a new origin–destination flow estimation model that uses only vehicle counts observed by traffic count sensors; it requires neither historical origin–destination trip data for the estimation nor any assumed distribution of flow. This approach utilises a method of statistical origin–destination flow estimation in computer networks, and transfers the principles to the domain of road traffic by applying transport-geographic constraints in order to keep traffic embedded in physical space. Being purely stochastic, our model overcomes the conceptual weaknesses of the existing models, and additionally estimates travel times of individual vehicles. The model has been implemented in a real-world road network in the city of Melbourne, Australia. The model was validated with simulated data and real-world observations from two different data sources. The validation results show that all the origin–destination flows were estimated with a good accuracy score using link count data only. Additionally, the estimated travel times by the model were close approximations to the observed travel times in the real world.

Download Full-text