Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions

Maryam Moghimi; Herbert W. Corley

doi:10.3390/data5030084

Information Loss Due to the Data Reduction of Sample Data from Discrete Distributions

Data ◽

10.3390/data5030084 ◽

2020 ◽

Vol 5 (3) ◽

pp. 84

Author(s):

Maryam Moghimi ◽

Herbert W. Corley

Keyword(s):

Random Variable ◽

Information Loss ◽

Discrete Distributions ◽

Summary Statistic ◽

Discrete Random Variable ◽

Dimensional Parameter ◽

Data Set ◽

Sufficient Statistics ◽

One Dimensional ◽

Sample Data

In this paper, we study the information lost when a real-valued statistic is used to reduce or summarize sample data from a discrete random variable with a one-dimensional parameter. We compare the probability that a random sample gives a particular data set to the probability of the statistic’s value for this data set. We focus on sufficient statistics for the parameter of interest and develop a general formula independent of the parameter for the Shannon information lost when a data sample is reduced to such a summary statistic. We also develop a measure of entropy for this lost information that depends only on the real-valued statistic but neither the parameter nor the data. Our approach would also work for non-sufficient statistics, but the lost information and associated entropy would involve the parameter. The method is applied to three well-known discrete distributions to illustrate its implementation.

Download Full-text

Spatiotemporal Structure in Large Neuronal Networks Detected from Cross-Correlation

Neural Computation ◽

10.1162/neco.2006.18.10.2387 ◽

2006 ◽

Vol 18 (10) ◽

pp. 2387-2413 ◽

Cited By ~ 28

Author(s):

Gaby Schneider ◽

Martha N. Havenith ◽

Danko Nikolić

Keyword(s):

Cross Correlation ◽

Data Sets ◽

Firing Time ◽

Dimensional Representation ◽

Data Set ◽

Spatiotemporal Structure ◽

One Dimensional ◽

Sample Data ◽

Measured Phase ◽

Spatiotemporal Relations

The analysis of neuronal information involves the detection of spatiotemporal relations between neuronal discharges. We propose a method that is based on the positions (phase offsets) of the central peaks obtained from pairwise cross-correlation histograms. Data complexity is reduced to a one-dimensional representation by using redundancies in the measured phase offsets such that each unit is assigned a “preferred firing time” relative to the other units in the group. We propose two procedures to examine the applicability of this method to experimental data sets. In addition, we propose methods that help the investigation of dynamical changes in the preferred firing times of the units. All methods are applied to a sample data set obtained from cat visual cortex.

Download Full-text

Grand Canyon riverbed sediment changes, experimental release of September 2000 - a sample data set

Open-File Report ◽

10.3133/ofr03265 ◽

2003 ◽

Cited By ~ 2

Author(s):

Florence L. Wong ◽

Roberto J. Anima ◽

Peter Galanis ◽

Jennifer Codianne ◽

Yu Xia ◽

...

Keyword(s):

Grand Canyon ◽

Data Set ◽

Sample Data ◽

Riverbed Sediment ◽

Experimental Release

Download Full-text

Investigation of Regression-Based Effect Size Methods Developed in Single-Subject Studies

Behavior Modification ◽

10.1177/01454455211054018 ◽

2021 ◽

pp. 014544552110540

Author(s):

Nihal Sen

Keyword(s):

Effect Size ◽

Experimental Studies ◽

Single Subject ◽

Single Subject Design ◽

Data Set ◽

Design Studies ◽

Sample Data ◽

The Difference ◽

D Values ◽

Effect Size Calculation

The purpose of this study is to provide a brief introduction to effect size calculation in single-subject design studies, including a description of nonparametric and regression-based effect sizes. We then focus the rest of the tutorial on common regression-based methods used to calculate effect size in single-subject experimental studies. We start by first describing the difference between five regression-based methods (Gorsuch, White et al., Center et al., Allison and Gorman, Huitema and McKean). This is followed by an example using the five regression-based effect size methods and a demonstration how these methods can be applied using a sample data set. In this way, the question of how the values obtained from different effect size methods differ was answered. The specific regression models used in these five regression-based methods and how these models can be obtained from the SPSS program were shown. R2 values obtained from these five methods were converted to Cohen’s d value and compared in this study. The d values obtained from the same data set were estimated as 0.003, 0.357, 2.180, 3.470, and 2.108 for the Allison and Gorman, Gorsuch, White et al., Center et al., as well as for Huitema and McKean methods, respectively. A brief description of selected statistical programs available to conduct regression-based methods was given.

Download Full-text

A New Approach of Stochastic Dominance for Ranking Transformations on the Discrete Random Variable

Economics The Open-Access Open-Assessment E-Journal ◽

10.5018/economics-ejournal.ja.2017-14 ◽

2017 ◽

Author(s):

Jianwei Gao ◽

Feng Zhao

Keyword(s):

Stochastic Dominance ◽

Random Variable ◽

Discrete Random Variable ◽

New Approach

Download Full-text

A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities

Entropy ◽

10.3390/e20080601 ◽

2018 ◽

Vol 20 (8) ◽

pp. 601 ◽

Cited By ~ 3

Author(s):

Paul Darscheid ◽

Anneli Guthke ◽

Uwe Ehret

Keyword(s):

Maximum Entropy ◽

Multinomial Distribution ◽

Entropy Method ◽

Small Sample ◽

Discrete Distributions ◽

Occupation Probability ◽

Small Samples ◽

Data Set ◽

Sample Distribution ◽

Leibler Divergence

When constructing discrete (binned) distributions from samples of a data set, applications exist where it is desirable to assure that all bins of the sample distribution have nonzero probability. For example, if the sample distribution is part of a predictive model for which we require returning a response for the entire codomain, or if we use Kullback–Leibler divergence to measure the (dis-)agreement of the sample distribution and the original distribution of the variable, which, in the described case, is inconveniently infinite. Several sample-based distribution estimators exist which assure nonzero bin probability, such as adding one counter to each zero-probability bin of the sample histogram, adding a small probability to the sample pdf, smoothing methods such as Kernel-density smoothing, or Bayesian approaches based on the Dirichlet and Multinomial distribution. Here, we suggest and test an approach based on the Clopper–Pearson method, which makes use of the binominal distribution. Based on the sample distribution, confidence intervals for bin-occupation probability are calculated. The mean of each confidence interval is a strictly positive estimator of the true bin-occupation probability and is convergent with increasing sample size. For small samples, it converges towards a uniform distribution, i.e., the method effectively applies a maximum entropy approach. We apply this nonzero method and four alternative sample-based distribution estimators to a range of typical distributions (uniform, Dirac, normal, multimodal, and irregular) and measure the effect with Kullback–Leibler divergence. While the performance of each method strongly depends on the distribution type it is applied to, on average, and especially for small sample sizes, the nonzero, the simple “add one counter”, and the Bayesian Dirichlet-multinomial model show very similar behavior and perform best. We conclude that, when estimating distributions without an a priori idea of their shape, applying one of these methods is favorable.

Download Full-text

Inferring health conditions from fMRI-graph data

10.31219/osf.io/r2huz ◽

2018 ◽

Author(s):

PierGianLuca Porta Mana ◽

Claudia Bachmann ◽

Abigail Morrison

Keyword(s):

Binary Classification ◽

Disease Diagnosis ◽

Health Condition ◽

Automated Classification ◽

Imaging Data ◽

Data Set ◽

Sufficient Statistics ◽

Conjugate Priors ◽

Healthy Control ◽

Partial Exchangeability

Automated classification methods for disease diagnosis are currently in the limelight, especially for imaging data. Classification does not fully meet a clinician's needs, however: in order to combine the results of multiple tests and decide on a course of treatment, a clinician needs the likelihood of a given health condition rather than binary classification yielded by such methods. We illustrate how likelihoods can be derived step by step from first principles and approximations, and how they can be assessed and selected, using fMRI data from a publicly available data set containing schizophrenic and healthy control subjects, as a working example. We start from the basic assumption of partial exchangeability, and then the notion of sufficient statistics and the "method of translation" (Edgeworth, 1898) combined with conjugate priors. This method can be used to construct a likelihood that can be used to compare different data-reduction algorithms. Despite the simplifications and possibly unrealistic assumptions used to illustrate the method, we obtain classification results comparable to previous, more realistic studies about schizophrenia, whilst yielding likelihoods that can naturally be combined with the results of other diagnostic tests.

Download Full-text

Fractional negative binomial and Pólya processes

Probability and Mathematical Statistics ◽

10.19195/0208-4147.38.1.5 ◽

2018 ◽

Vol 38 (1) ◽

pp. 77-101

Author(s):

Palaniappan Vellai Samy ◽

Aditya Maheshwari

Keyword(s):

Poisson Process ◽

Negative Binomial ◽

Random Variable ◽

Infinitely Divisible ◽

One Dimensional ◽

Fractional Poisson Process ◽

Negative Binomial Process ◽

The One ◽

Definition Of ◽

Binomial Process

In this paper, we define a fractional negative binomial process FNBP by replacing the Poisson process by a fractional Poisson process FPP in the gamma subordinated form of the negative binomial process. It is shown that the one-dimensional distributions of the FPP and the FNBP are not infinitely divisible. Also, the space fractional Pólya process SFPP is defined by replacing the rate parameter λ by a gamma random variable in the definition of the space fractional Poisson process. The properties of the FNBP and the SFPP and the connections to PDEs governing the density of the FNBP and the SFPP are also investigated.

Download Full-text

Simulation of the specific surface area of snow using a one-dimensional physical snowpack model: implementation and evaluation for subarctic snow in Alaska

The Cryosphere ◽

10.5194/tc-4-35-2010 ◽

2010 ◽

Vol 4 (1) ◽

pp. 35-51 ◽

Cited By ~ 26

Author(s):

H.-W. Jacobi ◽

F. Domine ◽

W. R. Simpson ◽

T. A. Douglas ◽

M. Sturm

Keyword(s):

Specific Surface ◽

Surface Area ◽

Specific Surface Area ◽

Winter Season ◽

Data Set ◽

Thermal Budget ◽

One Dimensional ◽

Snow Model ◽

Physics Model ◽

Snow Physics

Abstract. The specific surface area (SSA) of the snow constitutes a powerful parameter to quantify the exchange of matter and energy between the snow and the atmosphere. However, currently no snow physics model can simulate the SSA. Therefore, two different types of empirical parameterizations of the specific surface area (SSA) of snow are implemented into the existing one-dimensional snow physics model CROCUS. The parameterizations are either based on diagnostic equations relating the SSA to parameters like snow type and density or on prognostic equations that describe the change of SSA depending on snow age, snowpack temperature, and the temperature gradient within the snowpack. Simulations with the upgraded CROCUS model were performed for a subarctic snowpack, for which an extensive data set including SSA measurements is available at Fairbanks, Alaska for the winter season 2003/2004. While a reasonable agreement between simulated and observed SSA values is obtained using both parameterizations, the model tends to overestimate the SSA. This overestimation is more pronounced using the diagnostic equations compared to the results of the prognostic equations. Parts of the SSA deviations using both parameterizations can be attributed to differences between simulated and observed snow heights, densities, and temperatures. Therefore, further sensitivity studies regarding the thermal budget of the snowpack were performed. They revealed that reducing the thermal conductivity of the snow or increasing the turbulent fluxes at the snow surfaces leads to a slight improvement of the simulated thermal budget of the snowpack compared to the observations. However, their impact on further simulated parameters like snow height and SSA remains small. Including additional physical processes in the snow model may have the potential to advance the simulations of the thermal budget of the snowpack and, thus, the SSA simulations.

Download Full-text

A note on characteristic function for Bernstein polynomials involving special numbers and polynomials

Filomat ◽

10.2298/fil2002543s ◽

2020 ◽

Vol 34 (2) ◽

pp. 543-549

Author(s):

Buket Simsek

Keyword(s):

Characteristic Function ◽

Binomial Distribution ◽

Bernstein Polynomials ◽

Random Variable ◽

Stirling Numbers ◽

Bernoulli Numbers ◽

Discrete Random Variable ◽

Euler Numbers ◽

Negative Order ◽

Special Numbers And Polynomials

The aim of this present paper is to establish and study generating function associated with a characteristic function for the Bernstein polynomials. By this function, we derive many identities, relations and formulas relevant to moments of discrete random variable for the Bernstein polynomials (binomial distribution), Bernoulli numbers of negative order, Euler numbers of negative order and the Stirling numbers.

Download Full-text

Transformation of circular random variables based on circular distribution functions

Filomat ◽

10.2298/fil1817931m ◽

2018 ◽

Vol 32 (17) ◽

pp. 5931-5947

Author(s):

Hatami Mojtaba ◽

Alamatsaz Hossein

Keyword(s):

Distribution Function ◽

Random Variables ◽

Real Data ◽

Random Variable ◽

Distribution Functions ◽

Likelihood Method ◽

Circular Distribution ◽

Data Set ◽

Trigonometric Moments ◽

Von Mises

In this paper, we propose a new transformation of circular random variables based on circular distribution functions, which we shall call inverse distribution function (id f ) transformation. We show that M?bius transformation is a special case of our id f transformation. Very general results are provided for the properties of the proposed family of id f transformations, including their trigonometric moments, maximum entropy, random variate generation, finite mixture and modality properties. In particular, we shall focus our attention on a subfamily of the general family when id f transformation is based on the cardioid circular distribution function. Modality and shape properties are investigated for this subfamily. In addition, we obtain further statistical properties for the resulting distribution by applying the id f transformation to a random variable following a von Mises distribution. In fact, we shall introduce the Cardioid-von Mises (CvM) distribution and estimate its parameters by the maximum likelihood method. Finally, an application of CvM family and its inferential methods are illustrated using a real data set containing times of gun crimes in Pittsburgh, Pennsylvania.

Download Full-text