Effect of cluster sampling in biomass tables construction: ratio estimators models
To minimize field costs, it is ordinarily efficient to select sample trees by cluster sampling; that is, to select several trees from the same geographical location. But when the sample trees are used to construct biomass tables, it is also customary to use regression techniques based on the simple random sampling assumption. This procedure gives rise to errors in the statistical analysis that are largely ignored.A method is shown to evaluate the validity of the regression analysis when the undefined relationship between a dependent variable y and an independent variable x can be sufficiently well approximated by a line of the form y = βx. This method is then applied to two samples of trees selected in clusters of approximately five trees. The regression of nine variables y (various biomass components) on six variables x (functions of tree diameter and height) are then considered. It is shown that analyzing data selected by cluster sampling as data selected by simple random sampling results in relatively large errors in the interval estimates of β. More specifically, the confidence intervals based on the random sampling assumption are ordinarily found to be between 60 and 80% of those based on the cluster sampling assumption. Within the regression model used, however, the point estmates of β remained unchanged.