scholarly journals Performance of Kernel Estimator and Johnson SB Function for Modeling Diameter Distribution of Black Alder (Alnus glutinosa (L.) Gaertn.) Stands

Forests ◽  
2020 ◽  
Vol 11 (6) ◽  
pp. 634 ◽  
Author(s):  
Piotr Pogoda ◽  
Wojciech Ochał ◽  
Stanisław Orzeł

We compare the usefulness of nonparametric and parametric methods of diameter distribution modeling. The nonparametric method was represented by the new tool—kernel estimator of cumulative distribution function with bandwidths of 1 cm (KE1), 2 cm (KE2), and bandwidth obtained automatically (KEA). Johnson SB (JSB) function was used for the parametric method. The data set consisted of 7867 measurements made at breast height in 360 sample plots established in 36 managed black alder (Alnus glutinosa (L.) Gaertn.) stands located in southeastern Poland. The model performance was assessed using leave-one-plot-out cross-validation and goodness-of-fit measures: mean error, root mean squared error, Kolmogorov–Smirnov, and Anderson–Darling statistics. The model based on KE1 revealed a good fit to diameters forming training sets. A poor fit was observed for KEA. Frequency of diameters forming test sets were properly fitted by KEA and poorly by KE1. KEA develops more general models that can be used for the approximation of independent data sets. Models based on KE1 adequately fit local irregularities in diameter frequency, which may be considered as an advantageous in some situations and as a drawback in other conditions due to the risk of model overfitting. The application of the JSB function to training sets resulted in the worst fit among the developed models. The performance of the parametric method used to test sets varied depending on the criterion used. Similar to KEA, the JSB function gives more general models that emphasize the rough shape of the approximated distribution. Site type and stand age do not affect the fit of nonparametric models. The JSB function show slightly better fit in older stands. The differences between the average values of Kolmogorov–Smirnov (KS), Anderson–Darling (AD), and root mean squared error (RMSE) statistics calculated for models developed with test sets were statistically nonsignificant, which indicates the similar usefulness of the investigated methods for modeling diameter distribution.

Forests ◽  
2019 ◽  
Vol 10 (5) ◽  
pp. 412 ◽  
Author(s):  
Piotr Pogoda ◽  
Wojciech Ochał ◽  
Stanisław Orzeł

We present diameter distribution models for black alder (Alnus glutinosa (L.) Gaertn.) derived from diameter measurements made at breast height in 844 circular sample plots set in 163 managed stands located in south-eastern Poland. A total of 22,530 trees were measured. Stand age ranged from six to 89 years. The model formulation was based on the two-parameter Weibull function and a non-parametric percentile-based method. Weibull function parameters were recovered from the first raw and second central moments estimated using the stand quadratic mean diameter. The same stand characteristic was used to predict values of 12 percentiles in the percentile-based method. The model performance was assessed using the k-fold cross-validation method. The goodness-of-fit statistics include the Kolmogorov–Smirnov statistic, mean error, root mean squared error, and two variants of the error index introduced by Reynolds. The percentile model developed, accurately predicted diameter distributions in 88.4% of black alder stands, as compared to 81.9% for the Weibull model (Kolmogorov–Smirnov test). Alternative statistical metrics assessing goodness-of-fit to empirical distributions suggested that the non-parametric percentile model was superior to the parametric Weibull model, especially in stands older than 20 years. In younger stands, the two models were accurate only in 57% of the cases, and did not differ significantly with respect to goodness-of-fit measures.


2009 ◽  
Vol 26 (1) ◽  
pp. 94-118 ◽  
Author(s):  
David Tomás Jacho-Chávez

This paper characterizes the bandwidth value (h) that is optimal for estimating parameters of the form $\eta \, = \,E\left[ {\omega /f_{V|U} \left({V|U} \right)} \right]$, where the conditional density of a scalar continuous random variable V, given a random vector U, $f_{V|U} $, is replaced by its kernel estimator. That is, the parameter η is the expectation of ω inversely weighted by $f_{V|U} $, and it is the building block of various semiparametric estimators already proposed in the literature such as Lewbel (1998), Lewbel (2000b), Honoré and Lewbel (2002), Khan and Lewbel (2007), and Lewbel (2007). The optimal bandwidth is derived by minimizing the leading terms of a second-order mean squared error expansion of an in-probability approximation of the resulting estimator with respect to h. The expansion also demonstrates that the bandwidth can be chosen on the basis of bias alone, and that a simple “plug-in” estimator for the optimal bandwidth can be constructed. Finally, the small sample performance of our proposed estimator of the optimal bandwidth is assessed by a Monte Carlo experiment.


Author(s):  
Leila MOFTAKHAR ◽  
Mozhgan SEIF ◽  
Marziyeh Sadat SAFE

Background: The outbreak of COVID-19 is rapidly spreading around the world and became a pandemic disease. For help to better planning of interventions, this study was conducted to forecast the number of daily new infected cases with COVID-19 for next thirty days in Iran. Methods: The information of observed Iranian new cases from 19th Feb to 30th Mar 2020 was used to predict the number of patients until 29th Apr. Artificial Neural Networks (ANN) and Auto-Regressive Integrated Moving Average (ARIMA) models were applied for prediction. The data was prepared from daily reports of Iran Ministry of Health and open datasets provided by the JOHN Hopkins. To compare models, dataset was separated into train and test sets. Mean Squared Error (MSE) and Mean Absolute Error (MAE) was the comparison criteria. Results: Both algorithms forecasted an exponential increase in number of newly infected patients. If the spreading pattern continues the same as before, the number of daily new cases would be 7872 and 9558 by 29th Apr, respectively by ANN and ARIMA. While Model comparison confirmed that ARIMA prediction was more accurate than ANN. Conclusion: COVID-19 is contagious disease, and has infected many people in Iran. Our results are an alarm for health policy planners and decision-makers, to make timely decisions, control the disease and provide the equipment needed.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Heru Nugroho ◽  
Nugraha Priya Utama ◽  
Kridanto Surendro

AbstractA significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes. However, an adaptive search procedure helps to determine the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by determining the estimated value closest to others' value. Therefore, this study proposes a class center-based adaptive approach model for retrieving missing data by considering the attribute correlation in the imputation process (C3-FA). The result showed that the class center-based firefly algorithm (FA) is an efficient technique for obtaining the actual value in handling missing data with the Pearson correlation coefficient (r) and root mean squared error (RMSE) close to 1 and 0, respectively. In addition, the proposed method has the ability to maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test, which stated that the value of DKS for most attributes in the dataset is generally closer to 0. Furthermore, the accuracy evaluation results using three classifiers showed that the proposed method produces good accuracy.


2021 ◽  
Author(s):  
Heru Nugroho ◽  
Nugraha Priya Utama ◽  
Kridanto Surendro

Abstract A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes . However, an adaptive search procedure helps to determine the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by determining the estimated value closest to others' value. Therefore, this study proposes a class center-based adaptive approach model for retrieving missing data by considering the attribute correlation in the imputation process (C3-FA). The result showed that the class center-based firefly algorithm (FA) is an efficient technique for obtaining the actual value in handling missing data with the Pearson correlation coefficient ( r ) and root mean squared error (RMSE) close to 1 and 0, respectively. In addition, the proposed method has the ability to maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test, which stated that the value of DKS for most attributes in the dataset is generally closer to 0. Furthermore, the accuracy evaluation results using three classifiers showed that the proposed method produces good accuracy.


2020 ◽  
Vol 2020 ◽  
pp. 1-17
Author(s):  
Ghadah Alomani ◽  
Refah Alotaibi ◽  
Sanku Dey ◽  
Mahendra Saha

The process capability index (PCI) has been introduced as a tool to aid in the assessment of process performance. Usually, conventional PCIs perform well under normally distributed quality characteristics. However, when these PCIs are employed to evaluate nonnormally distributed process, they often provide inaccurate results. In this article, in order to estimate the PCI Spmk when the process follows power Lindley distribution, first, seven classical methods of estimation, namely, maximum likelihood method of estimation, ordinary and weighted least squares methods of estimation, Cramèr–von Mises method of estimation, maximum product of spacings method of estimation, Anderson–Darling, and right-tail Anderson–Darling methods of estimation, are considered and the performance of these estimation methods based on their mean squared error is compared. Next, three bootstrap confidence intervals (BCIs) of the PCI Spmk, namely, standard bootstrap, percentile bootstrap, and bias-corrected percentile bootstrap, are considered and compared in terms of their average width, coverage probability, and relative coverage. Besides, a new cost-effective PCI, namely, Spmkc is introduced by incorporating tolerance cost function in the index Spmk. To evaluate the performance of the methods of estimation and BCIs, a simulation study is carried out. Simulation results showed that the maximum likelihood method of estimation performs better than their counterparts in terms of mean squared error, while bias-corrected percentile bootstrap provides smaller confidence length (width) and higher relative coverage than standard bootstrap and percentile bootstrap across sample sizes. Finally, two real data examples are provided to investigate the performance of the proposed procedures.


2009 ◽  
Vol 8 (2) ◽  
pp. 1
Author(s):  
I W. MANGKU

Convergence of MSE (Mean-Squared-Error) of a uniform kernel estimator for intensity of a periodic Poisson process with unknowm period is presented and proved. The result presented here is a special case of the one in [3]. The aim of this paper is to present an alternative and a relatively simpler proof of convergence for the MSE of the estimator compared to the one in [3]. This is a joint work with R. Helmers and R. Zitikis.


2012 ◽  
Vol 61 (2) ◽  
pp. 277-290 ◽  
Author(s):  
Ádám Csorba ◽  
Vince Láng ◽  
László Fenyvesi ◽  
Erika Michéli

Napjainkban egyre nagyobb igény mutatkozik olyan technológiák és módszerek kidolgozására és alkalmazására, melyek lehetővé teszik a gyors, költséghatékony és környezetbarát talajadat-felvételezést és kiértékelést. Ezeknek az igényeknek felel meg a reflektancia spektroszkópia, mely az elektromágneses spektrum látható (VIS) és közeli infravörös (NIR) tartományában (350–2500 nm) végzett reflektancia-mérésekre épül. Figyelembe véve, hogy a talajokról felvett reflektancia spektrum információban nagyon gazdag, és a vizsgált tartományban számos talajalkotó rendelkezik karakterisztikus spektrális „ujjlenyomattal”, egyetlen görbéből lehetővé válik nagyszámú, kulcsfontosságú talajparaméter egyidejű meghatározása. Dolgozatunkban, a reflektancia spektroszkópia alapjaira helyezett, a talajok ösz-szetételének meghatározását célzó módszertani fejlesztés első lépéseit mutatjuk be. Munkánk során talajok szervesszén- és CaCO3-tartalmának megbecslését lehetővé tévő többváltozós matematikai-statisztikai módszerekre (részleges legkisebb négyzetek módszere, partial least squares regression – PLSR) épülő prediktív modellek létrehozását és tesztelését végeztük el. A létrehozott modellek tesztelése során megállapítottuk, hogy az eljárás mindkét talajparaméter esetében magas R2értéket [R2(szerves szén) = 0,815; R2(CaCO3) = 0,907] adott. A becslés pontosságát jelző közepes négyzetes eltérés (root mean squared error – RMSE) érték mindkét paraméter esetében közepesnek mondható [RMSE (szerves szén) = 0,467; RMSE (CaCO3) = 3,508], mely a reflektancia mérési előírások standardizálásával jelentősen javítható. Vizsgálataink alapján arra a következtetésre jutottunk, hogy a reflektancia spektroszkópia és a többváltozós kemometriai eljárások együttes alkalmazásával, gyors és költséghatékony adatfelvételezési és -értékelési módszerhez juthatunk.


Sign in / Sign up

Export Citation Format

Share Document