conditional complexity
Recently Published Documents


TOTAL DOCUMENTS

12
(FIVE YEARS 1)

H-INDEX

4
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Michael Cader Nelson

Every statistical estimate is equal to the sum of a nonrandom component, due to parameter values and bias, and a random component, due to sampling error. Estimation theory suggests that the two components are hopelessly confounded in the estimate. We would like to estimate the sign and magnitude of a statistic’s random deviation from its parameter--its accuracy--in the same way we quantify a statistic’s random variability around its parameter--its precision--by estimating the standard error. However, because the random component is an attribute of the sample data, it be described with parametric or Fisher information. In information theory, on the other hand, every information type--entropy, complexity--is understood as describing the extent of randomness in manifest data. This suggests that integrating the two conceptions of information could allow us to describe the two components of a statistical estimate, if only we could identify a common link between the two paradigms.The matching statistic, m, is such a link. For paired, ranked vectors X and Y of length n, m is the total number of paired observations in X and Y with matching ranks, m = Σ R(Xi) = R(Yi). That is, m is the number of fixed points between vectors. m has a long history in statistics, having served as the test statistic of a little-known null hypothesis statistical test (NHST) for the correlation coefficient, dating to around the turn of the twentieth century, called the matching method. Subtracting m from n yields a metric with a long history in information theory, the Hamming distance, a classic metric of the conditional complexity K(Y|X). Thus, m simultaneously contains both the Fisher information in a bivariate sample about the latent correlation and the conditional complexity or algorithmic information about the manifest observations.This paper shows that the presence of these two conflicting information types in m manifests a peculiar attribute in the statistic: m has an asymptotic efficiency less than or equal to zero relative to conventional correlation estimators computed on the same data. This means its Fisher information content decreases with increasing sample size, so that m’s random component is disproportionately large. Furthermore, when m and Pearson’s r are computed on the same sample, the two share a random component, and the value of m is indicative of the accuracy of r with respect to that component. Having proven this utility of m, by means theoretical and empirical (Monte Carlo simulations), additional matching statistics are constructed, including one composite statistic that is even more informative of the accuracy of r, and another that is indicative of the accuracy of Cohen’s d. Potential applications for computing accuracy-adjusted r are described, and implications are discussed.


2014 ◽  
Vol 79 (2) ◽  
pp. 620-632 ◽  
Author(s):  
B. BAUWENS ◽  
A. SHEN

AbstractPéter Gács showed (Gács 1974) that for every n there exists a bit string x of length n whose plain complexity C(x) has almost maximal conditional complexity relative to x, i.e., $C\left( {C\left( x \right)|x} \right) \ge {\rm{log}}n - {\rm{log}}^{\left( 2 \right)} n - O\left( 1 \right)$ (Here ${\rm{log}}^{\left( 2 \right)} i = {\rm{loglog}}i$.) Following Elena Kalinina (Kalinina 2011), we provide a simple game-based proof of this result; modifying her argument, we get a better (and tight) bound ${\rm{log}}n - O\left( 1 \right)$ We also show the same bound for prefix-free complexity.Robert Solovay showed (Solovay 1975) that infinitely many strings x have maximal plain complexity but not maximal prefix complexity (among the strings of the same length): for some c there exist infinitely many x such that $|x| - C\left( x \right) \le c$ and $|x| + K\left( {|x|} \right) - K\left( x \right) \ge {\rm{log}}^{\left( 2 \right)} |x| - c{\rm{log}}^{\left( 3 \right)} |x|$ In fact, the results of Solovay and Gács are closely related. Using the result above, we provide a short proof for Solovay’s result. We also generalize it by showing that for some c and for all n there are strings x of length n with $n - C\left( x \right) \le c$ and $n + K\left( n \right) - K\left( x \right) \ge K\left( {K\left( n \right)|n} \right) - 3K\left( {K\left( {K\left( n \right)|n} \right)|n} \right) - c.$ We also prove a close upper bound $K\left( {K\left( n \right)|n} \right) + O\left( 1 \right)$Finally, we provide a direct game proof for Joseph Miller’s generalization (Miller 2006) of the same Solovay’s theorem: if a co-enumerable set (a set with c.e. complement) contains for every length a string of this length, then it contains infinitely many strings x such that$|x| + K\left( {|x|} \right) - K\left( x \right) \ge {\rm{log}}^{\left( 2 \right)} |x| - O\left( {{\rm{log}}^{\left( 3 \right)} |x|} \right).$


2011 ◽  
Vol 274 (1) ◽  
pp. 90-104 ◽  
Author(s):  
Nikolay K. Vereshchagin ◽  
Andrej A. Muchnik

2011 ◽  
Vol 49 (2) ◽  
pp. 227-245 ◽  
Author(s):  
Daniil Musatov ◽  
Andrei Romashchenko ◽  
Alexander Shen

2010 ◽  
Vol 21 (03) ◽  
pp. 321-327 ◽  
Author(s):  
YEN-WU TI ◽  
CHING-LUEH CHANG ◽  
YUH-DAUH LYUU ◽  
ALEXANDER SHEN

A bit string is random (in the sense of algorithmic information theory) if it is incompressible, i.e., its Kolmogorov complexity is close to its length. Two random strings are independent if knowing one of them does not simplify the description of the other, i.e., the conditional complexity of each string (using the other as a condition) is close to its length. We may define independence of a k-tuple of strings in the same way. In this paper we address the following question: what is that maximal cardinality of a set of n-bit strings if any k elements of this set are independent (up to a certain constant)? Lower and upper bounds that match each other (with logarithmic precision) are provided.


Sign in / Sign up

Export Citation Format

Share Document