Modelling Sparse Interaction Matrices: Interward Migration in Hereford and Worcester, and the Underdispersion Problem

1993 ◽  
Vol 25 (8) ◽  
pp. 1201-1209 ◽  
Author(s):  
P J Boyle ◽  
R Flowerdew

Recently, it has been argued that migration models in which ordinary least squares regression is used are inappropriate, because the dependent variable (number of migrants) is a count. Instead, a Poisson regression approach can be adopted. Goodness of fit can be evaluated by using a deviance (log-likelihood) or X2 statistic, whose significance can be compared with a χ2 distribution with appropriate degrees of freedom. In this paper, such an approach is used to model ward-level migration flows within the county of Hereford and Worcester. However, it is shown that for this exceedingly sparse data set the deviance figures attained are very low, suggesting that there may be a problem of underdispersion. This is in contrast to the overdispersion which has been identified as a common problem in Poisson models. The low deviance figures arise from the large number of zeros and small flows in the data matrix, which invalidate the usual χ2 goodness-of-fit test. A simulation approach to the assessment of model goodness of fit is suggested, and the results from applying it to the Hereford and Worcester data set are described.

2005 ◽  
Vol 10 (4) ◽  
pp. 226-231 ◽  
Author(s):  
Juha Laine ◽  
Anja Noro ◽  
Harriet Finne-Soveri ◽  
Unto Häkkinen

Objectives: To explore the patient- and ward-level determinants of wage-adjusted nursing time in long-term care wards for the elderly with a view to improving efficiency of the use of labour resources. Methods: The wage-adjusted nursing time given to patients was obtained from a time measurement study. Patient characteristics were based on the Minimum Data Set 2.0 of the Resident Assessment Instrument for nursing homes. Ordinary least-squares regression analysis and multilevel modelling were used to disentangle the effect of patient- and ward-level factors on nursing time. Results: A significant difference in wage-adjusted nursing time between wards was detected, which was partly explained by characteristics of patients in wards. The combination of patients' physical functioning and cognition and the Resource Utilization Groups RUG-III/22 resource use classification explained 20–25% of patients' nursing time over a 24-hour period. Variables related to the operational environment of the ward did not explain differences in wage-adjusted nursing time once the patient profile of the ward had been controlled for. The results also showed notable unmeasured patient and ward level effects, inefficiency and randomness in the allocation of nursing time. Conclusions: By improving the allocation and use of labour resources, the substantial variation in nursing time between wards could be diminished. Managers should allocate their staff primarily according to patients' resource needs.


2009 ◽  
Vol 2009 ◽  
pp. 1-8 ◽  
Author(s):  
Janet Myhre ◽  
Daniel R. Jeske ◽  
Michael Rennie ◽  
Yingtao Bi

A heteroscedastic linear regression model is developed from plausible assumptions that describe the time evolution of performance metrics for equipment. The inherited motivation for the related weighted least squares analysis of the model is an essential and attractive selling point to engineers with interest in equipment surveillance methodologies. A simple test for the significance of the heteroscedasticity suggested by a data set is derived and a simulation study is used to evaluate the power of the test and compare it with several other applicable tests that were designed under different contexts. Tolerance intervals within the context of the model are derived, thus generalizing well-known tolerance intervals for ordinary least squares regression. Use of the model and its associated analyses is illustrated with an aerospace application where hundreds of electronic components are continuously monitored by an automated system that flags components that are suspected of unusual degradation patterns.


1971 ◽  
Vol 97 (2-3) ◽  
pp. 325-330 ◽  
Author(s):  
J. H. Pollard

In his paper of 1941, Seal included details of some experiments he performed in an attempt to estimate the appropriate number of degrees of freedom for the chi-square goodness-of-fit test of a summation formula graduation. These results are referred to by Tetley and by Benjamin and Haycocks in their textbooks when they mention the difficulty of determining the number of degrees of freedom or mean chi-square value.


2019 ◽  
Vol 80 (1) ◽  
pp. 38-50
Author(s):  
Kozo Harimaya ◽  
Koichi Kagitani

Purpose The purpose of this paper is to investigate the efficiency of the banking business of Japan’s agricultural cooperatives (JAs), which depend heavily on financial business with non-farmers, contradictory to cooperative principles. Design/methodology/approach The authors construct a panel data set over 2005–2016 from the financial statements of JAs’ prefectural-level federations and use the input distance stochastic frontier model with a time-variant inefficiency effect for analysis. Both the flow and stock measures of the banking output are used in identical models and the efficiency results are compared. The authors also investigate the determinants of efficiency by using the Tobit and ordinary least squares regression models. Findings There is strong evidence of significant prefectural differences in efficiency values. The ratio of lending to non-members to total loans is positively related to efficiency. In contrast, the higher reliance on a central organization and credit business leads to lower efficiency. Research limitations/implications Apart from banking, JAs provide mutual insurance business services. As the authors investigate only the efficiency of JAs’ banking business in this study, it would be necessary to investigate the efficiency of their insurance business as well when evaluating JAs’ overall financial business. Originality/value There are few studies that investigate the efficiency of JAs’ banking business and its determinants, although significant attention has been paid to their excessive dependence on the financial business.


1993 ◽  
Vol 20 (1) ◽  
pp. 133-143 ◽  
Author(s):  
David Hansen ◽  
Dale I. Bray

Sediment rating curves in conjunction with daily flow data have often been used to estimate the total mass of sediment flowing past a given river cross section over relatively long periods of time. Techniques are presented that seek to make the best use of limited noncontinuous suspended sediment concentration data to generate nine partial years of suspended sediment load by means of sediment rating curves for the Kennebecasis River, N.B. (drainage area of 1100 km2). Initially, the data were partitioned in an attempt to improve correlations between concentration and discharge. Such partitioning by season, month, periods of rising stage, and periods of falling stage did not uniformly improve correlations as compared with the correlations for nonpartitioned data. Various combinations of less well-known methods were then used, including a moving intercept method that makes greater use of point concentration observations in time, and correction factor methods for simple power-type relations as suggested by Ferguson and by Duan. In addition, the validity of some of the underlying assumptions for performing ordinary least-squares regression is examined for this data set. Finally, the effect of daily flow averaging on the computed load was examined and found to be small for this basin. Key words: suspended sediment, C–Q rating curves, flow averaging, washload estimates, statistical bias, regression estimates.


2001 ◽  
Vol 44 (2) ◽  
pp. 446-461 ◽  
Author(s):  
Jennifer Windsor ◽  
Rochelle L. Milbrath ◽  
Edward J. Carney ◽  
Susan E. Rakowski

Although the general slowing hypothesis of language impairment (LI) is well established, the conventional method to test the hypothesis is controversial. This paper compares the usual method, ordinary least squares regression (OLS), with another method, hierarchical linear modeling with random coefficients (HLM). The analyses used available response time (RT) data from studies of perceptual-motor, cognitive, and language skills of LI and chronological-age-matched (CA) groups. The data set included RT measures from 25 studies investigating 20 different tasks (e.g., auditory detection, mental rotation, and word recognition tasks). OLS and HLM analyses of the RT data yielded very different results. OLS supported general slowing for the LI groups, and indicated that they were significantly slower than CA groups across studies by an overall estimate of 10%. HLM indicated a larger average extent of LI slowing (18%). However, the variability around this average was much greater than that yielded by OLS, and the extent of slowing was not statistically significant. Importantly, HLM showed a significant difference in the RT relation between LI and CA groups across studies, indicating that study-specific slowing, rather than general slowing across studies, was present. A separate HLM analysis of two types of language tasks, picture naming and word recognition, was performed. Although the extent of slowing was equivalent across these tasks, the slowing was minimal (2%) and not significant. Methodological limitations of each analysis to assess general slowing are highlighted.


2020 ◽  
Vol 50 ◽  
Author(s):  
Mohan KC ◽  
Euan G. Mason ◽  
Horacio E. Bown ◽  
Grace Jones

Background: Additivity has long been recognised as a desirable property of systems of equations to predict the biomass of components and the whole tree. However, most tree biomass studies report biomass equations fitted using traditional ordinary least-squares regression. Therefore, we aimed to develop models to estimate components, subtotals and above-ground total biomass for a Pinus radiata D.Don biomass dataset using traditional linear and nonlinear ordinary leastsquares regressions, and to contrast these equations with the additive procedures of biomass estimation.Methods: A total of 24 ten-year-old trees were felled to assess above-ground biomass. Two broad procedures were implemented for biomass modelling: (a) independent; and (b) additive. For the independent procedure, traditional linear models (LINOLS) with scaled power transformations and y-intercepts and nonlinear power models (NLINOLS) without y-intercepts were compared. The best linear (transformed) models from the independent procedure were further tested in three different additive structures (LINADD1, LINADD2, and LINADD3). All models were evaluated using goodness-of-fit statistics, standard errors of estimates, and residual plots.Results: The LINOLS with scaled power transformations and y-intercepts performed better for all components, subtotals and total above-ground biomass in contrast to NLINOLS that lacked y-intercepts. The additive model (LINADD3) in a joint generalised linear least-squares regression, also called seemingly unrelated regression (SUR), provided the best goodness-of-fit statistics and residual plots for four out of six components (stem, branch, new foliage and old foliage), two out of three subtotals (foliage and crown), and above-ground total biomass compared to other methods. However, bark, cone and bole biomass were better predicted by the LINOLS method.Conclusions: SUR was the best method to predict biomass for the 24-tree dataset because it provided the best goodness-of-fit statistics with unbiased estimates for 7 out of 10 biomass components. This study may assist silviculturists and forest managers to overcome one of the main problems when using biomass equations fitted independently for each tree component, which is that the sum of the biomasses of the predicted tree components does not necessarily add to the total biomass, as the additive biomass models do.


Biometrika ◽  
2019 ◽  
Vol 106 (3) ◽  
pp. 716-723
Author(s):  
Mengyu Xu ◽  
Danna Zhang ◽  
Wei Biao Wu

Summary We establish an approximation theory for Pearson’s chi-squared statistics in situations where the number of cells is large, by using a high-dimensional central limit theorem for quadratic forms of random vectors. Our high-dimensional central limit theorem is proved under Lyapunov-type conditions that involve a delicate interplay between the dimension, the sample size, and the moment conditions. We propose a modified chi-squared statistic and introduce an adjusted degrees of freedom. A simulation study shows that the modified statistic outperforms Pearson’s chi-squared statistic in terms of both size accuracy and power. Our procedure is applied to the construction of a goodness-of-fit test for Rutherford’s alpha-particle data.


Author(s):  
Martin Karlsson ◽  
Tor Iversen ◽  
Henning Øien

An open issue in the economics literature is whether healthcare expenditure (HCE) is so concentrated in the last years before death that the age profiles in spending will change when longevity increases. The seminal article “aging of Population and HealthCare Expenditure: A Red Herring?” by Zweifel and colleagues argued that that age is a distraction in explaining growth in HCE. The argument was based on the observation that age did not predict HCE after controlling for time to death (TTD). The authors were soon criticized for the use of a Heckman selection model in this context. Most of the recent literature makes use of variants of a two-part model and seems to give some role to age as well in the explanation. Age seems to matter more for long-term care expenditures (LTCE) than for acute hospital care. When disability is accounted for, the effects of age and TTD diminish. Not many articles validate their approach by comparing properties of different estimation models. In order to evaluate popular models used in the literature and to gain an understanding of the divergent results of previous studies, an empirical analysis based on a claims data set from Germany is conducted. This analysis generates a number of useful insights. There is a significant age gradient in HCE, most for LTCE, and costs of dying are substantial. These “costs of dying” have, however, a limited impact on the age gradient in HCE. These findings are interpreted as evidence against the red herring hypothesis as initially stated. The results indicate that the choice of estimation method makes little difference and if they differ, ordinary least squares regression tends to perform better than the alternatives. When validating the methods out of sample and out of period, there is no evidence that including TTD leads to better predictions of aggregate future HCE. It appears that the literature might benefit from focusing on the predictive power of the estimators instead of their actual fit to the data within the sample.


Author(s):  
Chisimkwuo John ◽  
Chukwuemeka O. Omekara ◽  
Godwin Okwara

An indicative feature of a principal component analysis (PCA) variant to the multivariate data set is the ability to transform correlated linearly dependent variables to linearly independent principal components. Back-transforming these components with the samples and variables approximated on a single calibrated plot gives rise to the PCA Biplots. In this work, the predictive property of the PCA biplot was augmented in the visualization of anthropometric measurements namely; weight (kg), height (cm), skinfold (cm), arm muscle circumference AMC (cm), mid upper arm circumference MUAC (cm) collected from the students of School of Nursing and Midwifery, Federal Medical Center (FMC), Umuahia, Nigeria. The adequacy and quality of the PCA Biplot was calculated and the predicted samples are then compared with the ordinary least square (OLS) regression predictions since both predictions makes use of an indicative minimization of the error sum of squares. The result suggests that the PCA biplot prediction merits further consideration when handling correlated multivariate data sets as its predictions with mean square error (MSE) of 0.00149 seems to be better when compared to the OLS regression predictions with MSE of 29.452.


Sign in / Sign up

Export Citation Format

Share Document