scholarly journals Ensemble Estimation of Information Divergence †

Entropy ◽  
2018 ◽  
Vol 20 (8) ◽  
pp. 560 ◽  
Author(s):  
Kevin Moon ◽  
Kumar Sricharan ◽  
Kristjan Greenewald ◽  
Alfred Hero

Recent work has focused on the problem of nonparametric estimation of information divergence functionals between two continuous random variables. Many existing approaches require either restrictive assumptions about the density support set or difficult calculations at the support set boundary which must be known a priori. The mean squared error (MSE) convergence rate of a leave-one-out kernel density plug-in divergence functional estimator for general bounded density support sets is derived where knowledge of the support boundary, and therefore, the boundary correction is not required. The theory of optimally weighted ensemble estimation is generalized to derive a divergence estimator that achieves the parametric rate when the densities are sufficiently smooth. Guidelines for the tuning parameter selection and the asymptotic distribution of this estimator are provided. Based on the theory, an empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions. The estimator is shown to be robust to the choice of tuning parameters. We show extensive simulation results that verify the theoretical results of our paper. Finally, we apply the proposed estimator to estimate the bounds on the Bayes error rate of a cell classification problem.

2021 ◽  
Vol 8 (4) ◽  
pp. 309-332
Author(s):  
Efosa Michael Ogbeide ◽  
Joseph Erunmwosa Osemwenkhae

Density estimation is an important aspect of statistics. Statistical inference often requires the knowledge of observed data density. A common method of density estimation is the kernel density estimation (KDE). It is a nonparametric estimation approach which requires a kernel function and a window size (smoothing parameter H). It aids density estimation and pattern recognition. So, this work focuses on the use of a modified intersection of confidence intervals (MICIH) approach in estimating density. The Nigerian crime rate data reported to the Police as reported by the National Bureau of Statistics was used to demonstrate this new approach. This approach in the multivariate kernel density estimation is based on the data. The main way to improve density estimation is to obtain a reduced mean squared error (MSE), the errors for this approach was evaluated. Some improvements were seen. The aim is to achieve adaptive kernel density estimation. This was achieved under a sufficiently smoothing technique. This adaptive approach was based on the bandwidths selection. The quality of the estimates obtained of the MICIH approach when applied, showed some improvements over the existing methods. The MICIH approach has reduced mean squared error and relative faster rate of convergence compared to some other approaches. The approach of MICIH has reduced points of discontinuities in the graphical densities the datasets. This will help to correct points of discontinuities and display adaptive density. Keywords: approach, bandwidth, estimate, error, kernel density


Geophysics ◽  
1971 ◽  
Vol 36 (2) ◽  
pp. 261-265 ◽  
Author(s):  
James N. Galbraith

Prediction error filtering has been widely used for deconvolution. The mean squared error in prediction is a monotonically nonincreasing function of operator length, and the value of the error is readily available from the Wiener‐Levinson algorithm. In general, the value of this error for the infinitely long operator is not known a priori. It is shown that the final value of the error can be obtained by considering the Kolmogorov spectrum factorization. Simple criteria can then be established for operator effectiveness and length.


2013 ◽  
Vol 67 (11) ◽  
Author(s):  
Apilak Worachartcheewan ◽  
Chanin Nantasenamat ◽  
Chartchalerm Isarankura-Na-Ayudhya ◽  
Virapong Prachayasittikul

AbstractA data set of amidino bis-benzimidazoles, in particular 2′-arylsubstituted-1H,1′H-[2,5′]bisbenzimidazolyl-5-carboximidine derivatives with anti-malarial activity against Plasmodium falciparum was employed in investigating the quantitative structure-activity relationship (QSAR). Quantum chemical and molecular descriptors were obtained from B3LYP/6-31g(d) calculations and Dragon software, respectively. Significant variables, which included total energy (E T), highest occupied molecular orbital (HOMO), Moran autocorrelation-lag3/weighted by atomic masses (MATS3m), Geary autocorrelation-lag8/weighted by atomic masses (GATS8m), and 3D-MoRSEsignal 11/weighted by atomic Sanderson electronegativities (Mor11e), were used in the construction of QSAR models using multiple linear regression (MLR) and artificial neural network (ANN). The results indicated that the predictive models for both the MLR and ANN approaches using leave-one-out cross-validation afforded a good performance in modelling the anti-malarial activity against P. falciparum as observed by correlation coefficients of leave-one-out cross-validation (R LOO-CV) of 0.9760 and 0.9821, respectively, root mean squared error of leave-one-out cross-validation (RMSELOO-CV) of 0.1301 and 0.1102, respectively, and predictivity of leave-one-out cross-validation (Q LOO-CV2) of 0.9526 and 0.9645, respectively. Model validation was performed using an external testing set and the results suggested that the model provided good predictivity for both MLR and ANN models with correlation coefficient of the external set (R Ext) values of 0.9978 and 0.9844, respectively, root mean squared error of the external set (RMSEExt) of 0.0764 and 0.1302 respectively, and predictivity of the external set (Q Ext2) of 0.9956 and 0.9690, respectively. Furthermore, the robustness of the QSAR models is corroborated by a number of statistical parameters, comprising adjusted correlation coefficient (R Adj2), standard deviation (s), predicted residual sum of squares (PRESS), standard error of prediction (SDEP), total sum of squares deviation (SSY), and quality factor (Q). The QSAR models so constructed provide pertinent insights for the future design of anti-malarial agents.


1994 ◽  
Vol 115 (2) ◽  
pp. 335-363 ◽  
Author(s):  
Stephen Man Sing Lee

AbstractA parametric bootstrap estimate (PB) may be more accurate than its non-parametric version (NB) if the parametric model upon which it is based is, at least approximately, correct. Construction of an optimal estimator based on both PB and NB is pursued with the aim of minimizing the mean squared error. Our approach is to pick an empirical estimate of the optimal tuning parameter ε∈[0, 1] which minimizes the mean square error of εNB+(1−ε) PB. The resulting hybrid estimator is shown to be more reliable than either PB or NB uniformly over a rich class of distributions. Theoretical asymptotic results show that the asymptotic error of this hybrid estimator is quite close in distribution to the smaller of the errors of PB and NB. All these errors typically have the same convergence rate of order . A particular example is also presented to illustrate the fact that this hybrid estimate can indeed be strictly better than either of the pure bootstrap estimates in terms of minimizing mean squared error. Two simulation studies were conducted to verify the theoretical results and demonstrate the good practical performance of the hybrid method.


2019 ◽  
Author(s):  
André Beauducel ◽  
Martin Kersting

Until now there has been no successful exploration of a priori unknown faceted structure by means of exploratory factor analysis (EFA) of the measured variables (items or tasks). For this reason, we investigate by means of a simulation study how well methods for factor rotation can identify a two-facet orthogonal simple structure. Samples were generated from orthogonal two-facet population factor models with 4 (2 factors per facet) to 12 factors (6 factors per facet) and submitted to factor analysis with subsequent Varimax, Equamax, Parsimax, Factor Parsimony, Tandem I, Tandem II, Infomax, and McCammon’s Minimum Entropy rotation. As a benchmark, orthogonal target rotation of the sample loadings towards the corresponding faceted population loadings was also investigated. The conditions were sample size (n = 400, 1,000), number of factors (q = 4-12), and main loading size (l = .40, .50, .60). Mean congruence coefficients of the sample loading matrices with the corresponding population loading matrices and the root mean squared error between sample loading matrices and corresponding population loading matrices were used as dependent measures. For less than six factors Infomax and McCammon’s Minimum Entropy rotation and for six and more factors Tandem II rotation yielded the highest similarity of sample loading matrices with faceted population loading matrices. Analysis of data of 393 participants that performed a test for the Berlin Model of Intelligence Structure revealed that the faceted structure of this model could be found by means of target rotation of task aggregates corresponding to the cross-products of the facets. Moreover, McCammon’s Minimum Entropy rotation resulted in a loading pattern corresponding to the model, although the factor for figural intelligence was only weakly represented. Implications for the identification of faceted models by means of factor rotation are discussed.


2012 ◽  
Vol 61 (2) ◽  
pp. 277-290 ◽  
Author(s):  
Ádám Csorba ◽  
Vince Láng ◽  
László Fenyvesi ◽  
Erika Michéli

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.


Sign in / Sign up

Export Citation Format

Share Document