Prediction of watermain failure frequencies using multiple and Poisson regression

A. Asnaashari; E. A. McBean; I. Shahrour; B. Gharabaghi

doi:10.2166/ws.2009.020

Prediction of watermain failure frequencies using multiple and Poisson regression

Water Science & Technology Water Supply ◽

10.2166/ws.2009.020 ◽

2009 ◽

Vol 9 (1) ◽

pp. 9-19 ◽

Cited By ~ 16

Author(s):

A. Asnaashari ◽

E. A. McBean ◽

I. Shahrour ◽

B. Gharabaghi

Keyword(s):

Multiple Regression ◽

Poisson Regression ◽

Regression Models ◽

Historical Data ◽

Correlation Coefficients ◽

Water Utilities ◽

Failure Frequency ◽

Poisson Models ◽

Important Concern ◽

The City

An important concern for water utilities managers is the prediction of failure frequency of watermains. To provide insight, reliance can be structured based upon modeling of historical data. In this research two regression-based models are employed, namely multiple and Poisson regression models. The models are derived based on 10 years of historical data collected for the city of Sanandaj in Iran. Several tests to validate each of the models are described. The comparison of correlation coefficients for multiple and Poisson models, besides violating initial assumptions, show that multiple regression-based modeling is inadequate.

Download Full-text

A test of inflated zeros for Poisson regression models

Statistical Methods in Medical Research ◽

10.1177/0962280217749991 ◽

2017 ◽

Vol 28 (4) ◽

pp. 1157-1169 ◽

Cited By ~ 1

Author(s):

Hua He ◽

Hui Zhang ◽

Peng Ye ◽

Wan Tang

Keyword(s):

Poisson Regression ◽

Regression Models ◽

Type I Error ◽

Large Body ◽

Poisson Model ◽

Type I ◽

New Approach ◽

Poisson Models ◽

Zero Inflated Poisson Models ◽

Vuong Test

Excessive zeros are common in practice and may cause overdispersion and invalidate inference when fitting Poisson regression models. There is a large body of literature on zero-inflated Poisson models. However, methods for testing whether there are excessive zeros are less well developed. The Vuong test comparing a Poisson and a zero-inflated Poisson model is commonly applied in practice. However, the type I error of the test often deviates seriously from the nominal level, rendering serious doubts on the validity of the test in such applications. In this paper, we develop a new approach for testing inflated zeros under the Poisson model. Unlike the Vuong test for inflated zeros, our method does not require a zero-inflated Poisson model to perform the test. Simulation studies show that when compared with the Vuong test our approach not only better at controlling type I error rate, but also yield more power.

Download Full-text

Comprehensive approach to the tuberculosis indicators assessment of the children population in the Republic of Crimea

Bulletin physiology and pathology of respiration ◽

10.36604/1998-5029-2021-81-78-84 ◽

2021 ◽

pp. 78-84

Author(s):

T. N. Golubova ◽

N. M. Ovsannikova ◽

Z. R. Makhamova

Keyword(s):

Multiple Regression ◽

Regression Models ◽

Correlation Coefficients ◽

Pulmonary Tb ◽

Pairwise Correlations ◽

Multiple Regression Models ◽

Childhood Tb ◽

The Republic ◽

Paired Correlation

Introduction. Childhood tuberculosis (TB) control is relevant due to the peculiarities of its course in this age group, and the TB incidence in children is an important prognostic epidemiological indicator.Aim. Use of multivariate statistical analysis to estimate and predict childhood TB indicators in the Republic of Crimea (RC).Materials and methods. The official TB statistics in the Republic of Crimea for 2014-2018 are used. The calculated means of the indicators are checked for normality using the Kolmogorov-Smirnov and Shapiro-Wilk tests. Pearson correlation analysis is applied to determine pair correlation relationships. Stepwise multiple regression analysis is carried out to determine group conditionality of the indicators, where coefficients, with which significant pairwise correlations are found, are selected as independent variables. Based on the results, multiple regression equations are made to predict the values of dependent variables. The data is processed using Statistica 10.0 software.Results. For childhood TB incidence, strong direct correlations are established with the incidence and prevalence of pulmonary TB among children. The paired correlation coefficient between the incidence of childhood TB and childhood lung TB and the detection of active TB patients in preventive examinations of children varied in the range of 0.63-0.72. For the prevalence of TB among children, strong direct correlations were found with the incidence of TB and pulmonary TB in children. Multiple correlation coefficients for the incidence and prevalence of childhood TB exceeded the values of paired correlation coefficients and were in the range of 0.93 to 0.98 (p<0.001), indicating greater significance of group conditionality of the indicators. Determination coefficients R2 were between 0.87 and 0.96. Multiple regression models were built for the childhood TB incidence, childhood lung TB incidence, childhood TB prevalence, childhood lung TB prevalence.Conclusion. The found strong direct pairwise correlations for childhood TB incidence and prevalence and childhood pulmonary TB incidence and prevalence can serve as prognostic criteria and reflect the quality of antituberculosis interventions. High values of paired correlation coefficient between childhood TB incidence and childhood pulmonary TB and detection of patients with active TB in preventive examinations of children are a criterion of quality of both TB services and primary care, which can prevent the spread of TB and improve the epidemic situation of TB in Crimea. The calculated multiple regression models for the studied indicators can serve the needs of practical forecasting in Healthcare.

Download Full-text

A comparison of poisson regression models fitted to multiway summary tables and cox's survival model using data from a blood pressure screening in the city of Bergen, Norway

Statistics in Medicine ◽

10.1002/sim.4780091005 ◽

1990 ◽

Vol 9 (10) ◽

pp. 1157-1165 ◽

Cited By ~ 10

Author(s):

Randi Selmer

Keyword(s):

Blood Pressure ◽

Poisson Regression ◽

Regression Models ◽

Survival Model ◽

Blood Pressure Screening ◽

Using Data ◽

The City

Download Full-text

Evidence of Adaptation to Increasing Temperatures

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph17010097 ◽

2019 ◽

Vol 17 (1) ◽

pp. 97 ◽

Cited By ~ 2

Author(s):

Lisbeth Weitensfelder ◽

Hanns Moshammer

Keyword(s):

Poisson Regression ◽

Regression Models ◽

Age Groups ◽

Vulnerable Groups ◽

Mortality Data ◽

Optimal Temperature ◽

Average Temperature ◽

Time Period ◽

Middle Europe ◽

The City

In times of rising temperatures, the question arises on how the human body adapts. When assumed that changing climate leads to adaptation, time series analysis should reveal a shift in optimal temperatures. The city of Vienna is especially affected by climate change due to its location in the Alpine Range in Middle Europe. Based on mortality data, we calculated shifts in optimal temperature for a time period of 49 years in Vienna with Poisson regression models. Results show a shift in optimal temperature, with optimal temperature increasing more than average temperature. Hence, results clearly show an adaptation process, with more adaptation to warmer than colder temperatures. Nevertheless, some age groups remain more vulnerable than others and less able to adapt. Further research focusing on vulnerable groups should be encouraged.

Download Full-text

Nonparametric Models for Identifying Gaps in Message Feeds

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v10i1.8337 ◽

2018 ◽

Vol 10 (1) ◽

Author(s):

Andrew Walsh

Keyword(s):

Nonparametric Regression ◽

Poisson Regression ◽

Syndromic Surveillance ◽

Regression Models ◽

Computation Time ◽

Time Of Day ◽

Nonparametric Model ◽

Nonparametric Models ◽

Poisson Models ◽

Data Requirements

ObjectiveCharacterize the behavior of nonparametric regression models for message arrival probability as outage detection tools.IntroductionTimely and accurate syndromic surveillance depends on continuous data feeds from healthcare facilities. Typical outlier detection methodologies in syndromic surveillance compare predictions of counts for an interval to observed event counts, either to detect increases in volume associated with public health incidents or decreases in volume associated with compromised data transmission.Accurate predictions of total facility volume need to account for significant variance associated with the time of day and week; at the extreme are facilities which are only open during limited hours and on select days. Models need to account for the cross-product of all hours and days, creating a significant data burden. Timely detection of outages may require sub-hour aggregation, increasing this burden by increasing the number of intervals for which parameters need to be estimated.Nonparametric models for the probability of message arrival offer an alternative approach to generating predictions. The data requirements are reduced by assuming some time-dependent structure in the data rather than allowing each interval to be independent of all others, allowing for predictions at sub-hour intervals.MethodsHealthcare facility data was collected as HL7 messages via the EpiCenter syndromic surveillance system from June 1, 2017 through August 31, 2017. 713 facilities sent at least 1,000 messages during this period and were included in the analysis.Standard Poisson regression models were fit to counts of messages per quarter hour. Predictors were indicators for day of week, hour of day, and quarter of hour, along with interaction terms between them.Nonparametric logistic regression models were fit to data on the presence or absence of any message for each minute of the first two months of the study period, using the minute within the week as a predictor. The last month of data was scanned for outages at 15-minute intervals and calculating the probability of no messages since the last received message per facility as:P(Gap from mlast to mnow) = ∏t 1 - Pmessage(t)Four consecutive intervals with probability below 1-10 were considered outages.ResultsA total of 12,710,275 ADT A04 messages were received from 713 facilities from June 1, 2017 through August 31, 2017.Estimation of Poisson regression models averaged 1 minute, while nonparametric models averaged 1.5 minutes to estimate. Poisson models required 672 parameters to specify, whereas nonparametric models required 29. Calculating predictions from fitted models averaged 0.2 seconds for Poisson models and 2 seconds for nonparametric models. Although predictions from the two models are not on identical scales and thus not directly comparable, they did correlate well with each other with an average correlation of 0.8.The nonparametric regression method detected 175 resolved outages and 9 open outages in August, 2017. The resolved outages lasted an average of 1.5 days (1.75 hours to 15 days). The likelihood of these outages averaged 6e-13 (3e-160 to 4e-11).Figure 1 illustrates how the nonparametric models can be used in a dashboard for all 713 connections. Likelihood of an outage is available for each facility based on how long it has been since the last message was received; this can be updated every minute as needed. Figure 2 illustrates the predictions from a nonparametric model for a single facility and a detected outage.ConclusionsNonparametric regression models of message arrival demonstrated suitable performance for use in detecting connection outages. Compared to standard Poisson regression models, computation time for nonparametric models was longer but within acceptable ranges for operational needs and storage was significantly reduced. Further, storage and computation time for standard models will increase if greater time granularity is desired, whereas the nonparametric models require no additional storage or computation. Model predictions were sufficiently similar between both models for the two to give comparable performance in detecting outages. Given the greater time flexibility of the nonparametric models and the smaller data requirements for initial model estimation (due to fewer estimated parameters), the nonparametric approach represents a promising new option for monitoring syndromic surveillance data quality.

Download Full-text

What motivates Chinese physicians to provide online counseling services in Internet hospitals: online reputation or offline reputation? (Preprint)

10.2196/preprints.24498 ◽

2020 ◽

Author(s):

Ronghua Xu ◽

Tingting Zhang ◽

Qingpeng Zhang

Keyword(s):

Regression Models ◽

Counseling Services ◽

Online Counseling ◽

Consulting Services ◽

Online Consultation ◽

Number Of Patients ◽

Professional Titles ◽

Online Reputation ◽

Chinese Physicians ◽

The City

BACKGROUND Internet hospitals, or e-hospitals, as one kind of e-health platforms in China, provided novel channels through which physicians present their medical or health-care knowledge to patients and provide online counseling services. The sustainable development of Internet hospitals and e-health platforms relied on the participation of both the patients and the physicians, especially on the provision of health consultation services by the physicians. OBJECTIVE The objective of our study was to explore the factors motivating Chinese physicians to provide online health counseling services from the perspectives of their online reputation and offline reputation. METHODS We collected the data of 141,030 physicians from 6,173 offline hospitals and 350 cities on WeDoctor, an Internet hospital platform authorized by the China Health and Family Planning Committee. We selected the physicians’ online consultation volume, the total amount of counseling conversations from all channels of the platform, as the investigated dependent variable, reflecting the actual online counseling behaviors of the physicians in the platform. Based on the reputation theories and prior study, we incorporated patients’ feedback as the physicians’ online reputation (i.e. patients’ comments and their satisfaction scores), and incorporated the physicians’ offline professional status as the offline reputation (i.e. professional titles and the rankings of their offline working hospitals). We also delved the moderated effects of the city levels where the physicians lived offline and the number of patients who were watching the physicians online. Eight research hypotheses were proposed. Step-wise linear regression models were used to test our hypotheses. Durbin-Watson test and robustness tests were also conducted to ensure the fitness and reliability of our models. RESULTS As a result of the regression models, we found that, 1) physicians’ online reputation, including the number of comments written by the patients (beta=0.588, P<0.001), the satisfaction scores (beta=0.034, P<0.01), significantly and positively influence physicians’ online counseling behaviors; 2) Physicians’ offline reputation, including their professional titles (beta=-0.084, P<0.001) and the hospital rankings (beta=-0.163, P<0.001), significantly and negatively influence physicians’ online counseling behaviors; 3) the city levels where the physicians lived strengthen the negative effect between their offline hospital rankings and their online consulting services (beta=-0.177, P<0.001), indicating that physicians of higher offline reputation spend less time on online counseling, possibly due to the relative heavier offline workload; 4) the number of watching patients weakens the positive effect between patients’ comments and physicians’ online consulting services (beta=-0.216, P<0.001), indicating that the watching patients may switch the channels from online consultation to offline hospital visits after using the Internet hospitals. CONCLUSIONS This study contributed to the literature on physicians online counseling behaviors in Internet hospitals by verifying the contrasting effects of the online reputation and the offline reputation. It then contributed to the motivation theory by separating the online reputation from the offline reputation when the acting entities have constraints of limited time and effort. This study can also provide practical insights for the hospital managers to better arrange for the online counseling services and for the policy makers to consider the patients’ online feedback into the overall evaluation of the physicians’ reputation.

Download Full-text

Anticipated help-seeking for cancer symptoms before and after the coronavirus pandemic: results from the Onco-barometer population survey in Spain

British Journal of Cancer ◽

10.1038/s41416-021-01382-1 ◽

2021 ◽

Author(s):

Dafina Petrova ◽

Marina Pollán ◽

Miguel Rodriguez-Barranco ◽

Dunia Garrido ◽

Josep M. Borrás ◽

...

Keyword(s):

Poisson Regression ◽

Help Seeking ◽

Regression Models ◽

Population Survey ◽

Waiting Times ◽

Perceived Barriers ◽

Older Individuals ◽

Cancer Symptoms ◽

Before And After ◽

Diagnostic Delays

Abstract Background The patient interval—the time patients wait before consulting their physician after noticing cancer symptoms—contributes to diagnostic delays. We compared anticipated help-seeking times for cancer symptoms and perceived barriers to help-seeking before and after the coronavirus pandemic. Methods Two waves (pre-Coronavirus: February 2020, N = 3269; and post-Coronavirus: August 2020, N = 1500) of the Spanish Onco-barometer population survey were compared. The international ABC instrument was administered. Pre–post comparisons were performed using multiple logistic and Poisson regression models. Results There was a consistent and significant increase in anticipated times to help-seeking for 12 of 13 cancer symptoms, with the largest increases for breast changes (OR = 1.54, 95% CI 1.22–1–96) and unexplained bleeding (OR = 1.50, 1.26–1.79). Respondents were more likely to report barriers to help-seeking in the post wave, most notably worry about what the doctor may find (OR = 1.58, 1.35–1.84) and worry about wasting the doctor’s time (OR = 1.48, 1.25–1.74). Women and older individuals were the most affected. Conclusions Participants reported longer waiting times to help-seeking for cancer symptoms after the pandemic. There is an urgent need for public interventions encouraging people to consult their physicians with symptoms suggestive of cancer and counteracting the main barriers perceived during the pandemic situation.

Download Full-text

242 Effect of trucking distance on sale price of beef calf and feeder cattle lots sold through video auctions from 2010 through 2018

Journal of Animal Science ◽

10.1093/jas/skaa054.017 ◽

2020 ◽

Vol 98 (Supplement_3) ◽

pp. 10-11

Author(s):

Esther D McCabe ◽

Mike E King ◽

Karol E Fike ◽

Maggie J Smith ◽

Glenn M Rogers ◽

...

Keyword(s):

Multiple Regression ◽

Regression Models ◽

Average Weight ◽

Sale Price ◽

Feeder Cattle ◽

A Value ◽

Multiple Regression Models ◽

Determine Effect

Abstract The objective was to determine effect of trucking distance on sale price of beef calf and feeder cattle lots sold through Superior Livestock Video Auctions from 2010 through 2018. Data analyzed were collected from 211 livestock video auctions. There were 42,043 beef calf lots and 19,680 feeder cattle lots used in these analyses. Six states (Colorado, Iowa, Kansas, Nebraska, Oklahoma, and Texas) of delivery comprised 70% of calf lots and 83% of feeder cattle lots and were used in these analyses. All lot characteristics that could be accurately quantified or categorized were used to develop multiple regression models that evaluated effects of independent factors using backwards selection. A value of P < 0.05 was used to maintain a factor in the final models. Based upon reported state of origin and state of delivery, lots were categorized into one of the following trucking distance categories: 1) Within-State, 2) Short-Haul, 3) Medium-Haul, and 4) Long-Haul. Average weight and number of calves in lots analyzed was 259.2 ± 38.4 kg BW and 100.6 ± 74.3 head, respectively. Average weight and number of feeder cattle in lots analyzed was 358.4 ± 34.3 kg BW and 110.6 ± 104.1 head, respectively. Beef calf lots hauled Within-State sold for more ($169.24/45.36 kg; P < 0.0001) than other trucking distance categories (Table 1). Long-Haul calf lots sold for the lowest (P < 0.0001) price ($166.70/45.36 kg). Within-State and Short-Haul feeder cattle lots sold for the greatest (P < 0.0001) price ($149.96 and $149.81/45.36 kg, respectively; Table 2). Long-Haul feeder cattle lots sold for the lowest (P < 0.0001) price, $148.43/45.36 kg. These results indicate there is a price advantage for lots expected to be hauled shorter distances, likely because of cost and risk associated with transportation.

Download Full-text