scholarly journals GEOMAX: beyond linear compression for three-point galaxy clustering statistics

2020 ◽  
Vol 497 (1) ◽  
pp. 776-792 ◽  
Author(s):  
Davide Gualdi ◽  
Héctor Gil-Marín ◽  
Marc Manera ◽  
Benjamin Joachimi ◽  
Ofer Lahav

ABSTRACT We present the GEOMAX algorithm and its python implementation for a two-step compression of bispectrum measurements. The first step groups bispectra by the geometric properties of their arguments; the second step then maximizes the Fisher information with respect to a chosen set of model parameters in each group. The algorithm only requires the derivatives of the data vector with respect to the parameters and a small number of mock data, producing an effective, non-linear compression. By applying GEOMAX to bispectrum monopole measurements from BOSS DR12 CMASS redshift-space galaxy clustering data, we reduce the 68 per cent credible intervals for the inferred parameters (b1, b2, f, σ8) by 50.4, 56.1, 33.2, and 38.3 per cent with respect to standard MCMC on the full data vector. We run the analysis and comparison between compression methods over 100 galaxy mocks to test the statistical significance of the improvements. On average, GEOMAX performs ∼15 per cent better than geometrical or maximal linear compression alone and is consistent with being lossless. Given its flexibility, the GEOMAX approach has the potential to optimally exploit three-point statistics of various cosmological probes like weak lensing or line-intensity maps from current and future cosmological data sets such as DESI, Euclid, PFS, and SKA.

Marketing ZFP ◽  
2019 ◽  
Vol 41 (4) ◽  
pp. 33-42
Author(s):  
Thomas Otter

Empirical research in marketing often is, at least in parts, exploratory. The goal of exploratory research, by definition, extends beyond the empirical calibration of parameters in well established models and includes the empirical assessment of different model specifications. In this context researchers often rely on the statistical information about parameters in a given model to learn about likely model structures. An example is the search for the 'true' set of covariates in a regression model based on confidence intervals of regression coefficients. The purpose of this paper is to illustrate and compare different measures of statistical information about model parameters in the context of a generalized linear model: classical confidence intervals, bootstrapped confidence intervals, and Bayesian posterior credible intervals from a model that adapts its dimensionality as a function of the information in the data. I find that inference from the adaptive Bayesian model dominates that based on classical and bootstrapped intervals in a given model.


2020 ◽  
Vol 501 (1) ◽  
pp. 994-1001
Author(s):  
Suman Sarkar ◽  
Biswajit Pandey ◽  
Snehasish Bhattacharjee

ABSTRACT We use an information theoretic framework to analyse data from the Galaxy Zoo 2 project and study if there are any statistically significant correlations between the presence of bars in spiral galaxies and their environment. We measure the mutual information between the barredness of galaxies and their environments in a volume limited sample (Mr ≤ −21) and compare it with the same in data sets where (i) the bar/unbar classifications are randomized and (ii) the spatial distribution of galaxies are shuffled on different length scales. We assess the statistical significance of the differences in the mutual information using a t-test and find that both randomization of morphological classifications and shuffling of spatial distribution do not alter the mutual information in a statistically significant way. The non-zero mutual information between the barredness and environment arises due to the finite and discrete nature of the data set that can be entirely explained by mock Poisson distributions. We also separately compare the cumulative distribution functions of the barred and unbarred galaxies as a function of their local density. Using a Kolmogorov–Smirnov test, we find that the null hypothesis cannot be rejected even at $75{{\ \rm per\ cent}}$ confidence level. Our analysis indicates that environments do not play a significant role in the formation of a bar, which is largely determined by the internal processes of the host galaxy.


Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1850
Author(s):  
Rashad A. R. Bantan ◽  
Farrukh Jamal ◽  
Christophe Chesneau ◽  
Mohammed Elgarhy

Unit distributions are commonly used in probability and statistics to describe useful quantities with values between 0 and 1, such as proportions, probabilities, and percentages. Some unit distributions are defined in a natural analytical manner, and the others are derived through the transformation of an existing distribution defined in a greater domain. In this article, we introduce the unit gamma/Gompertz distribution, founded on the inverse-exponential scheme and the gamma/Gompertz distribution. The gamma/Gompertz distribution is known to be a very flexible three-parameter lifetime distribution, and we aim to transpose this flexibility to the unit interval. First, we check this aspect with the analytical behavior of the primary functions. It is shown that the probability density function can be increasing, decreasing, “increasing-decreasing” and “decreasing-increasing”, with pliant asymmetric properties. On the other hand, the hazard rate function has monotonically increasing, decreasing, or constant shapes. We complete the theoretical part with some propositions on stochastic ordering, moments, quantiles, and the reliability coefficient. Practically, to estimate the model parameters from unit data, the maximum likelihood method is used. We present some simulation results to evaluate this method. Two applications using real data sets, one on trade shares and the other on flood levels, demonstrate the importance of the new model when compared to other unit models.


2020 ◽  
Vol 70 (1) ◽  
pp. 145-161 ◽  
Author(s):  
Marnus Stoltz ◽  
Boris Baeumer ◽  
Remco Bouckaert ◽  
Colin Fox ◽  
Gordon Hiscott ◽  
...  

Abstract We describe a new and computationally efficient Bayesian methodology for inferring species trees and demographics from unlinked binary markers. Likelihood calculations are carried out using diffusion models of allele frequency dynamics combined with novel numerical algorithms. The diffusion approach allows for analysis of data sets containing hundreds or thousands of individuals. The method, which we call Snapper, has been implemented as part of the BEAST2 package. We conducted simulation experiments to assess numerical error, computational requirements, and accuracy recovering known model parameters. A reanalysis of soybean SNP data demonstrates that the models implemented in Snapp and Snapper can be difficult to distinguish in practice, a characteristic which we tested with further simulations. We demonstrate the scale of analysis possible using a SNP data set sampled from 399 fresh water turtles in 41 populations. [Bayesian inference; diffusion models; multi-species coalescent; SNP data; species trees; spectral methods.]


2018 ◽  
Vol 612 ◽  
pp. A70 ◽  
Author(s):  
J. Olivares ◽  
E. Moraux ◽  
L. M. Sarro ◽  
H. Bouy ◽  
A. Berihuete ◽  
...  

Context. Membership analyses of the DANCe and Tycho + DANCe data sets provide the largest and least contaminated sample of Pleiades candidate members to date. Aims. We aim at reassessing the different proposals for the number surface density of the Pleiades in the light of the new and most complete list of candidate members, and inferring the parameters of the most adequate model. Methods. We compute the Bayesian evidence and Bayes Factors for variations of the classical radial models. These include elliptical symmetry, and luminosity segregation. As a by-product of the model comparison, we obtain posterior distributions for each set of model parameters. Results. We find that the model comparison results depend on the spatial extent of the region used for the analysis. For a circle of 11.5 parsecs around the cluster centre (the most homogeneous and complete region), we find no compelling reason to abandon King’s model, although the Generalised King model introduced here has slightly better fitting properties. Furthermore, we find strong evidence against radially symmetric models when compared to the elliptic extensions. Finally, we find that including mass segregation in the form of luminosity segregation in the J band is strongly supported in all our models. Conclusions. We have put the question of the projected spatial distribution of the Pleiades cluster on a solid probabilistic framework, and inferred its properties using the most exhaustive and least contaminated list of Pleiades candidate members available to date. Our results suggest however that this sample may still lack about 20% of the expected number of cluster members. Therefore, this study should be revised when the completeness and homogeneity of the data can be extended beyond the 11.5 parsecs limit. Such a study will allow for more precise determination of the Pleiades spatial distribution, its tidal radius, ellipticity, number of objects and total mass.


2014 ◽  
Vol 70 (a1) ◽  
pp. C344-C344
Author(s):  
Silvia Russi ◽  
Shawn Kann ◽  
Henry van den Bedem ◽  
Ana M. González

Protein crystallography data collection at synchrotrons today is routinely carried out at cryogenic temperatures to mitigate radiation damage to the crystal. Although damage still takes place, at 100 K and below, the immobilization of free radicals increases the lifetime of the crystals by orders of magnitude. Increasingly, experiments are carried out at room temperature. The lack of adequate cryo-protectants, the induced lattice changes or internal disorders during the cooling process, and the convenience of collecting data directly from the crystallization plates, are some of the reasons. Moreover, recent studies have shown that flash-freezing affects the conformational ensemble of crystal structures [1], and can hide important functional mechanisms from observation [2]. While there has been a considerable amount of effort in studying radiation damage at cryo-temperatures, its effects at room temperature are still not well understood. We investigated the effects of data collection temperature on secondary local damage to the side chain and main chain from different proteins. Data were collected from crystals of thaumatin and lysozyme at 100 K and room temperature. To carefully control the total absorbed dose, full data sets at room temperature were assembled from a few diffraction images per crystal. Several data sets were collected at increasing levels of absorbed dose. Our analysis shows that while at cryogenic temperatures, radiation damage increases the conformational variability, _x0004_at room temperature it has the opposite effect_x0005_. We also observed that disulfide bonds appear to break up at a different relative rate at room temperature, perhaps because of a more active repair mechanism. Our analysis suggests that elevated conformational heterogeneity in crystal structures at room temperature is observed despite radiation damage, and not as a result thereof.


2017 ◽  
Vol 3 (5) ◽  
pp. e192 ◽  
Author(s):  
Corina Anastasaki ◽  
Stephanie M. Morris ◽  
Feng Gao ◽  
David H. Gutmann

Objective:To ascertain the relationship between the germline NF1 gene mutation and glioma development in patients with neurofibromatosis type 1 (NF1).Methods:The relationship between the type and location of the germline NF1 mutation and the presence of a glioma was analyzed in 37 participants with NF1 from one institution (Washington University School of Medicine [WUSM]) with a clinical diagnosis of NF1. Odds ratios (ORs) were calculated using both unadjusted and weighted analyses of this data set in combination with 4 previously published data sets.Results:While no statistical significance was observed between the location and type of the NF1 mutation and glioma in the WUSM cohort, power calculations revealed that a sample size of 307 participants would be required to determine the predictive value of the position or type of the NF1 gene mutation. Combining our data set with 4 previously published data sets (n = 310), children with glioma were found to be more likely to harbor 5′-end gene mutations (OR = 2; p = 0.006). Moreover, while not clinically predictive due to insufficient sensitivity and specificity, this association with glioma was stronger for participants with 5′-end truncating (OR = 2.32; p = 0.005) or 5′-end nonsense (OR = 3.93; p = 0.005) mutations relative to those without glioma.Conclusions:Individuals with NF1 and glioma are more likely to harbor nonsense mutations in the 5′ end of the NF1 gene, suggesting that the NF1 mutation may be one predictive factor for glioma in this at-risk population.


2020 ◽  
Vol 9 (1) ◽  
pp. 61-81
Author(s):  
Lazhar BENKHELIFA

A new lifetime model, with four positive parameters, called the Weibull Birnbaum-Saunders distribution is proposed. The proposed model extends the Birnbaum-Saunders distribution and provides great flexibility in modeling data in practice. Some mathematical properties of the new distribution are obtained including expansions for the cumulative and density functions, moments, generating function, mean deviations, order statistics and reliability. Estimation of the model parameters is carried out by the maximum likelihood estimation method. A simulation study is presented to show the performance of the maximum likelihood estimates of the model parameters. The flexibility of the new model is examined by applying it to two real data sets.


Author(s):  
Kaitlyn Johnson ◽  
Grant R. Howard ◽  
Daylin Morgan ◽  
Eric A. Brenner ◽  
Andrea L. Gardner ◽  
...  

SummaryA significant challenge in the field of biomedicine is the development of methods to integrate the multitude of dispersed data sets into comprehensive frameworks to be used to generate optimal clinical decisions. Recent technological advances in single cell analysis allow for high-dimensional molecular characterization of cells and populations, but to date, few mathematical models have attempted to integrate measurements from the single cell scale with other data types. Here, we present a framework that actionizes static outputs from a machine learning model and leverages these as measurements of state variables in a dynamic mechanistic model of treatment response. We apply this framework to breast cancer cells to integrate single cell transcriptomic data with longitudinal population-size data. We demonstrate that the explicit inclusion of the transcriptomic information in the parameter estimation is critical for identification of the model parameters and enables accurate prediction of new treatment regimens. Inclusion of the transcriptomic data improves predictive accuracy in new treatment response dynamics with a concordance correlation coefficient (CCC) of 0.89 compared to a prediction accuracy of CCC = 0.79 without integration of the single cell RNA sequencing (scRNA-seq) data directly into the model calibration. To the best our knowledge, this is the first work that explicitly integrates single cell clonally-resolved transcriptome datasets with longitudinal treatment response data into a mechanistic mathematical model of drug resistance dynamics. We anticipate this approach to be a first step that demonstrates the feasibility of incorporating multimodal data sets into identifiable mathematical models to develop optimized treatment regimens from data.


2016 ◽  
Author(s):  
Kassian Kobert ◽  
Alexandros Stamatakis ◽  
Tomáš Flouri

The phylogenetic likelihood function is the major computational bottleneck in several applications of evolutionary biology such as phylogenetic inference, species delimitation, model selection and divergence times estimation. Given the alignment, a tree and the evolutionary model parameters, the likelihood function computes the conditional likelihood vectors for every node of the tree. Vector entries for which all input data are identical result in redundant likelihood operations which, in turn, yield identical conditional values. Such operations can be omitted for improving run-time and, using appropriate data structures, reducing memory usage. We present a fast, novel method for identifying and omitting such redundant operations in phylogenetic likelihood calculations, and assess the performance improvement and memory saving attained by our method. Using empirical and simulated data sets, we show that a prototype implementation of our method yields up to 10-fold speedups and uses up to 78% less memory than one of the fastest and most highly tuned implementations of the phylogenetic likelihood function currently available. Our method is generic and can seamlessly be integrated into any phylogenetic likelihood implementation.


Sign in / Sign up

Export Citation Format

Share Document