scholarly journals The Concepts of Pseudo Compound Poisson and Partition Representations in Discrete Probability

2015 ◽  
Vol 2015 ◽  
pp. 1-6
Author(s):  
Werner Hürlimann

The mathematical/statistical concepts of pseudo compound Poisson and partition representations in discrete probability are reviewed and clarified. A combinatorial interpretation of the convolution of geometric distributions in terms of a variant of Newton’s identities is obtained. The practical use of the twofold convolution leads to an improved goodness-of-fit for a data set from automobile insurance that was up to now not fitted satisfactorily.

Mathematics ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 505
Author(s):  
Lluís Bermúdez ◽  
Dimitris Karlis

A multivariate INAR(1) regression model based on the Sarmanov distribution is proposed for modelling claim counts from an automobile insurance contract with different types of coverage. The correlation between claims from different coverage types is considered jointly with the serial correlation between the observations of the same policyholder observed over time. Several models based on the multivariate Sarmanov distribution are analyzed. The new models offer some advantages since they have all the advantages of the MINAR(1) regression model but allow for a more flexible dependence structure by using the Sarmanov distribution. Driven by a real panel data set, these models are considered and fitted to the data to discuss their goodness of fit and computational efficiency.


Author(s):  
Raul E. Avelar ◽  
Karen Dixon ◽  
Boniphace Kutela ◽  
Sam Klump ◽  
Beth Wemple ◽  
...  

The calibration of safety performance functions (SPFs) is a mechanism included in the Highway Safety Manual (HSM) to adjust SPFs in the HSM for use in intended jurisdictions. Critically, the quality of the calibration procedure must be assessed before using the calibrated SPFs. Multiple resources to aid practitioners in calibrating SPFs have been developed in the years following the publication of the HSM 1st edition. Similarly, the literature suggests multiple ways to assess the goodness-of-fit (GOF) of a calibrated SPF to a data set from a given jurisdiction. This paper uses the calibration results of multiple intersection SPFs to a large Mississippi safety database to examine the relations between multiple GOF metrics. The goal is to develop a sensible single index that leverages the joint information from multiple GOF metrics to assess overall quality of calibration. A factor analysis applied to the calibration results revealed three underlying factors explaining 76% of the variability in the data. From these results, the authors developed an index and performed a sensitivity analysis. The key metrics were found to be, in descending order: the deviation of the cumulative residual (CURE) plot from the 95% confidence area, the mean absolute deviation, the modified R-squared, and the value of the calibration factor. This paper also presents comparisons between the index and alternative scoring strategies, as well as an effort to verify the results using synthetic data. The developed index is recommended to comprehensively assess the quality of the calibrated intersection SPFs.


2021 ◽  
Vol 503 (2) ◽  
pp. 2688-2705
Author(s):  
C Doux ◽  
E Baxter ◽  
P Lemos ◽  
C Chang ◽  
A Alarcon ◽  
...  

ABSTRACT Beyond ΛCDM, physics or systematic errors may cause subsets of a cosmological data set to appear inconsistent when analysed assuming ΛCDM. We present an application of internal consistency tests to measurements from the Dark Energy Survey Year 1 (DES Y1) joint probes analysis. Our analysis relies on computing the posterior predictive distribution (PPD) for these data under the assumption of ΛCDM. We find that the DES Y1 data have an acceptable goodness of fit to ΛCDM, with a probability of finding a worse fit by random chance of p = 0.046. Using numerical PPD tests, supplemented by graphical checks, we show that most of the data vector appears completely consistent with expectations, although we observe a small tension between large- and small-scale measurements. A small part (roughly 1.5 per cent) of the data vector shows an unusually large departure from expectations; excluding this part of the data has negligible impact on cosmological constraints, but does significantly improve the p-value to 0.10. The methodology developed here will be applied to test the consistency of DES Year 3 joint probes data sets.


2010 ◽  
Vol 2 (2) ◽  
pp. 38-51 ◽  
Author(s):  
Marc Halbrügge

Keep it simple - A case study of model development in the context of the Dynamic Stocks and Flows (DSF) taskThis paper describes the creation of a cognitive model submitted to the ‘Dynamic Stocks and Flows’ (DSF) modeling challenge. This challenge aims at comparing computational cognitive models for human behavior during an open ended control task. Participants in the modeling competition were provided with a simulation environment and training data for benchmarking their models while the actual specification of the competition task was withheld. To meet this challenge, the cognitive model described here was designed and optimized for generalizability. Only two simple assumptions about human problem solving were used to explain the empirical findings of the training data. In-depth analysis of the data set prior to the development of the model led to the dismissal of correlations or other parametric statistics as goodness-of-fit indicators. A new statistical measurement based on rank orders and sequence matching techniques is being proposed instead. This measurement, when being applied to the human sample, also identifies clusters of subjects that use different strategies for the task. The acceptability of the fits achieved by the model is verified using permutation tests.


Symmetry ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 80 ◽  
Author(s):  
Martynas Narmontas ◽  
Petras Rupšys ◽  
Edmundas Petrauskas

In this work, we employ stochastic differential equations (SDEs) to model tree stem taper. SDE stem taper models have some theoretical advantages over the commonly employed regression-based stem taper modeling techniques, as SDE models have both simple analytic forms and a high level of accuracy. We perform fixed- and mixed-effect parameters estimation for the stem taper models by developing an approximated maximum likelihood procedure and using a data set of longitudinal measurements from 319 mountain pine trees. The symmetric Vasicek- and asymmetric Gompertz-type diffusion processes used adequately describe stem taper evolution. The proposed SDE stem taper models are compared to four regression stem taper equations and four volume equations. Overall, the best goodness-of-fit statistics are produced by the mixed-effect parameters SDEs stem taper models. All results are obtained in the Maple computer algebra system.


2020 ◽  
Vol 21 (15) ◽  
pp. 5280
Author(s):  
Irini Furxhi ◽  
Finbarr Murphy

The practice of non-testing approaches in nanoparticles hazard assessment is necessary to identify and classify potential risks in a cost effective and timely manner. Machine learning techniques have been applied in the field of nanotoxicology with encouraging results. A neurotoxicity classification model for diverse nanoparticles is presented in this study. A data set created from multiple literature sources consisting of nanoparticles physicochemical properties, exposure conditions and in vitro characteristics is compiled to predict cell viability. Pre-processing techniques were applied such as normalization methods and two supervised instance methods, a synthetic minority over-sampling technique to address biased predictions and production of subsamples via bootstrapping. The classification model was developed using random forest and goodness-of-fit with additional robustness and predictability metrics were used to evaluate the performance. Information gain analysis identified the exposure dose and duration, toxicological assay, cell type, and zeta potential as the five most important attributes to predict neurotoxicity in vitro. This is the first tissue-specific machine learning tool for neurotoxicity prediction caused by nanoparticles in in vitro systems. The model performs better than non-tissue specific models.


Biostatistics ◽  
2020 ◽  
Author(s):  
Chien-Lin Su ◽  
Robert W Platt ◽  
Jean-François Plante

Summary Recurrent event data are commonly encountered in observational studies where each subject may experience a particular event repeatedly over time. In this article, we aim to compare cumulative rate functions (CRFs) of two groups when treatment assignment may depend on the unbalanced distribution of confounders. Several estimators based on pseudo-observations are proposed to adjust for the confounding effects, namely inverse probability of treatment weighting estimator, regression model-based estimators, and doubly robust estimators. The proposed marginal regression estimator and doubly robust estimators based on pseudo-observations are shown to be consistent and asymptotically normal. A bootstrap approach is proposed for the variance estimation of the proposed estimators. Model diagnostic plots of residuals are presented to assess the goodness-of-fit for the proposed regression models. A family of adjusted two-sample pseudo-score tests is proposed to compare two CRFs. Simulation studies are conducted to assess finite sample performance of the proposed method. The proposed technique is demonstrated through an application to a hospital readmission data set.


1987 ◽  
Vol 44 (8) ◽  
pp. 1432-1442 ◽  
Author(s):  
Kenneth H. Reckhow ◽  
Robert W. Black ◽  
Thomas B. Stockton Jr. ◽  
J. David Vogt ◽  
Judith G. Wood

A large historical data set from the Adirondack region of New York was compiled to study the relationship between water chemistry variables associated with acid precipitation and the presence/absence of selected fish species. The data set was used to examine simple statistical models for fish presence/absence, as a function of the water chemistry variables, for brook trout (Salvelinus fontinalis), lake trout (Salvelinus namaycush), white sucker (Catostomus commersoni), and yellow perch (Perca flavescens). Of these models, only those for brook trout and lake trout were found to be acceptable based on statistical goodness-of-fit criteria; thus, parameters for models of these two species alone were estimated using maximum likelihood logistic regression. Candidate models for brook trout and lake trout were then examined, with particular consideration for the problems associated with model misspecification, errors-in-variables, and multicollinearity. For each of the two species, a model was recommended that may be used to predict the effect of changes in lake acidification on species presence/absence in lakes in the Adirondack region.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. 4315-4315
Author(s):  
Shoichi Nagakura ◽  
Tetsuyuki Kiyokawa ◽  
Michihiro Hidaka ◽  
Takahiro Yano ◽  
Kazutaka Sunami ◽  
...  

Abstract BACKGROUND: Despite recent increase of reduced intensity conditioning (RIC) transplantation, mortality rates after RIC and myeloabrative conditioning (MAC) HSCT remain high and hepatic veno-occlusive disease (VOD) cannot accurately predicted. OBJECTIVE: To determine the value of risk factors associated with the development of VOD after allergenic HSCT with RIC and MAC. Estimating VOD based on clinical factors may further improve results of allogenic HSCT. PATIENTS AND METHODS: A retrospective review of 415 consecutive allogenic HSCT was performed with attention to VOD, pre-transplant factors and laboratory data in five hematopoietic cell transplantation centers between 2000 and 2005. Patients underwent transplantation with MAC (n=247) or RIC (n=168). Main outcomes and risk factors were analyzed in multivariable analyses (a logistic regression model) with RIC and MAC. Three kind of laboratory data set, pre-transplant (day −10), post-transplant (day 20) and differences from pre-transplant to post-transplantation were analyzed. RESULTS: VOD occurred in 65 of 415(15.7%) transplant recipients; 40 of 247(16.1%) with MAC and 25 of 168(14.9%) with RIC. Multivariate analyses identified risk factors with the development of VOD with MAC (albumin level, creatinine level) and with RIC (HCT-CI, number of prior chemotherapy regimen, ALT) in pre-transplant laboratory data set. The risk factors of VOD were identified in post-transplant and differences (Table). The Akaike’s information criterion (AIC) of risk factors with differences was better than with the post-transplant. CONCLUSION: Our results provided risk factors of VOD with MAC and RIC. The estimation of VOD before transplantation may be useful for the selection of conditioning regimens. Differences of laboratory data with the time course of transplant may be useful for the early diagnosis of VOD. MAC Pre-transpant data Post-transplant data Differences data OR P-Value OR P-Value OR P-Value Age - - 0.945 0.0090 - - Alb 0.290 0.0125 - - - - Cr 10.204 0.0307 1.786 0.0039 1.984 0.0139 TPro - - 0 358 0.0019 - - TBi I - - 1.385 0.0027 1.314 0.0037 Ara-C - - 5.000 0.0139 goodness of fit AIC 106.727 126.499 86.931 RIC Pre-transpant data Post-transplant data Differences data OR P-Value OR P-Value OR P-Value Sex - - 3.401 0.0446 - - HCTCI 3.922 0.0050 2.000 0.0123 - - ImpScore 2.000 0.0314 - - - - TPro - - 0.366 0.0091 - - TBi I - - 1.675 0.0042 2.273 0.0004 ALT 0.969 0.0432 - - - - CY - - - - 5.682 0.0447 goodness of fit AIC 61.552 91.09 52.808


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e18098-e18098
Author(s):  
John Frownfelter ◽  
Sibel Blau ◽  
Ray D. Page ◽  
John Showalter ◽  
Kelly Miller ◽  
...  

e18098 Background: Artificial Intelligence(AI) for predictive analytics has been studied extensively in diagnostic imaging and genetic testing. Cognitive analytics adds by suggesting interventions that optimize health outcomes using real-time data and machine learning. Herein, we report the results of a pilot study of the Jvion, Inc. Cognitive Clinical Success Machine (CCSM), an eigen vector-based deep learning AI technology. Methods: The CCSM uses electronic medical record (EMR) and publicly available socioeconomic/behavioral databases to create a n-dimensional space within which patients are mapped along vectors resulting in thousands of relevant clusters of clinically/behaviorally similar patients. These clusters have a mathematical propensity to respond to a clinical intervention which are updated dynamically with new data from the site. The CCSM generates recommendations for the provider to consider as they develop a care plan based on the patients’ cluster. We tested and trained the CCSM technology at 3 US oncology practices for the risk (low, intermediate, high) of 4 specific outcomes: 30 day severe pain, 30 day mortality, 6 month clinical deterioration (ECOG-PS), and 6 month diagnosis of major depressive disorder (MDD). We report the accuracy of the CCSM based on the testing and training data sets. Area under the curve (AUC) was calculated to show goodness of fit of classification models for each outcome. Results: In the training/testing data set there were 371,787 patients from the 3 sites: female = 61.3%; age ≤ 50 = 21.3%, 51-65 = 26.9%, > 65 = 51.9%; white/Caucasian = 43.4%, black/African American = 5.9%, unknown race = 43.4%. Cancer types were unknown/missing for 66.3% of patients and stage for 90.4% of patients. AUC range per vector: 30 day severe/recurrent pain = 0.85-0.90; 30-day mortality = 0.86-0.97; 6-month ECOG-PS decline of 1 point = 0.88-0.92; and 6-month diagnosis of MDD = 0.77-0.90. Conclusions: The high AUC indicates good separation between true positives/negatives (proper model specification for classifying the risk of each outcome) regardless of the degree of missing data for variables including cancer type and stage. Following testing, a 6 month pilot program was implemented (06/2018-11/2018). Final results of the pilot program are pending.


Sign in / Sign up

Export Citation Format

Share Document