The Concepts of Pseudo Compound Poisson and Partition Representations in Discrete Probability

Multivariate INAR(1) Regression Models Based on the Sarmanov Distribution

Mathematics ◽

10.3390/math9050505 ◽

2021 ◽

Vol 9 (5) ◽

pp. 505

Author(s):

Lluís Bermúdez ◽

Dimitris Karlis

Keyword(s):

Regression Model ◽

Regression Models ◽

Serial Correlation ◽

Goodness Of Fit ◽

Dependence Structure ◽

Automobile Insurance ◽

Data Set ◽

New Models ◽

Different Types ◽

Over Time

A multivariate INAR(1) regression model based on the Sarmanov distribution is proposed for modelling claim counts from an automobile insurance contract with different types of coverage. The correlation between claims from different coverage types is considered jointly with the serial correlation between the observations of the same policyholder observed over time. Several models based on the multivariate Sarmanov distribution are analyzed. The new models offer some advantages since they have all the advantages of the MINAR(1) regression model but allow for a more flexible dependence structure by using the Sarmanov distribution. Driven by a real panel data set, these models are considered and fitted to the data to discuss their goodness of fit and computational efficiency.

Download Full-text

Simple Index to Assess the Calibration Quality of Safety Performance Functions Based on Multiple Goodness-of-Fit Metrics

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211008896 ◽

2021 ◽

pp. 036119812110088

Author(s):

Raul E. Avelar ◽

Karen Dixon ◽

Boniphace Kutela ◽

Sam Klump ◽

Beth Wemple ◽

...

Keyword(s):

Goodness Of Fit ◽

Synthetic Data ◽

Calibration Procedure ◽

Safety Performance ◽

Absolute Deviation ◽

Data Set ◽

Safety Database ◽

Simple Index ◽

Safety Performance Functions

The calibration of safety performance functions (SPFs) is a mechanism included in the Highway Safety Manual (HSM) to adjust SPFs in the HSM for use in intended jurisdictions. Critically, the quality of the calibration procedure must be assessed before using the calibrated SPFs. Multiple resources to aid practitioners in calibrating SPFs have been developed in the years following the publication of the HSM 1st edition. Similarly, the literature suggests multiple ways to assess the goodness-of-fit (GOF) of a calibrated SPF to a data set from a given jurisdiction. This paper uses the calibration results of multiple intersection SPFs to a large Mississippi safety database to examine the relations between multiple GOF metrics. The goal is to develop a sensible single index that leverages the joint information from multiple GOF metrics to assess overall quality of calibration. A factor analysis applied to the calibration results revealed three underlying factors explaining 76% of the variability in the data. From these results, the authors developed an index and performed a sensitivity analysis. The key metrics were found to be, in descending order: the deviation of the cumulative residual (CURE) plot from the 95% confidence area, the mean absolute deviation, the modified R-squared, and the value of the calibration factor. This paper also presents comparisons between the index and alternative scoring strategies, as well as an effort to verify the results using synthetic data. The developed index is recommended to comprehensively assess the quality of the calibrated intersection SPFs.

Download Full-text

Dark energy survey internal consistency tests of the joint cosmological probes analysis with posterior predictive distributions

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stab526 ◽

2021 ◽

Vol 503 (2) ◽

pp. 2688-2705

Author(s):

C Doux ◽

E Baxter ◽

P Lemos ◽

C Chang ◽

A Alarcon ◽

...

Keyword(s):

Dark Energy ◽

Internal Consistency ◽

Goodness Of Fit ◽

Small Scale ◽

P Value ◽

Posterior Predictive Distribution ◽

Data Set ◽

Dark Energy Survey ◽

Data Vector ◽

Energy Survey

ABSTRACT Beyond ΛCDM, physics or systematic errors may cause subsets of a cosmological data set to appear inconsistent when analysed assuming ΛCDM. We present an application of internal consistency tests to measurements from the Dark Energy Survey Year 1 (DES Y1) joint probes analysis. Our analysis relies on computing the posterior predictive distribution (PPD) for these data under the assumption of ΛCDM. We find that the DES Y1 data have an acceptable goodness of fit to ΛCDM, with a probability of finding a worse fit by random chance of p = 0.046. Using numerical PPD tests, supplemented by graphical checks, we show that most of the data vector appears completely consistent with expectations, although we observe a small tension between large- and small-scale measurements. A small part (roughly 1.5 per cent) of the data vector shows an unusually large departure from expectations; excluding this part of the data has negligible impact on cosmological constraints, but does significantly improve the p-value to 0.10. The methodology developed here will be applied to test the consistency of DES Year 3 joint probes data sets.

Download Full-text

Keep it simple - A case study of model development in the context of the Dynamic Stocks and Flows (DSF) task

Journal of Artificial General Intelligence ◽

10.2478/v10229-011-0008-2 ◽

2010 ◽

Vol 2 (2) ◽

pp. 38-51 ◽

Cited By ~ 1

Author(s):

Marc Halbrügge

Keyword(s):

Goodness Of Fit ◽

Model Development ◽

Cognitive Model ◽

Training Data ◽

Sequence Matching ◽

Data Set ◽

Depth Analysis ◽

Stocks And Flows ◽

Matching Techniques

Keep it simple - A case study of model development in the context of the Dynamic Stocks and Flows (DSF) taskThis paper describes the creation of a cognitive model submitted to the ‘Dynamic Stocks and Flows’ (DSF) modeling challenge. This challenge aims at comparing computational cognitive models for human behavior during an open ended control task. Participants in the modeling competition were provided with a simulation environment and training data for benchmarking their models while the actual specification of the competition task was withheld. To meet this challenge, the cognitive model described here was designed and optimized for generalizability. Only two simple assumptions about human problem solving were used to explain the empirical findings of the training data. In-depth analysis of the data set prior to the development of the model led to the dismissal of correlations or other parametric statistics as goodness-of-fit indicators. A new statistical measurement based on rank orders and sequence matching techniques is being proposed instead. This measurement, when being applied to the human sample, also identifies clusters of subjects that use different strategies for the task. The acceptability of the fits achieved by the model is verified using permutation tests.

Download Full-text

Models for Tree Taper Form: The Gompertz and Vasicek Diffusion Processes Framework

Symmetry ◽

10.3390/sym12010080 ◽

2020 ◽

Vol 12 (1) ◽

pp. 80 ◽

Cited By ~ 3

Author(s):

Martynas Narmontas ◽

Petras Rupšys ◽

Edmundas Petrauskas

Keyword(s):

Goodness Of Fit ◽

Diffusion Processes ◽

Parameters Estimation ◽

Data Set ◽

Mixed Effect ◽

Fit Statistics ◽

Stem Taper ◽

Longitudinal Measurements ◽

High Level ◽

Taper Models

In this work, we employ stochastic differential equations (SDEs) to model tree stem taper. SDE stem taper models have some theoretical advantages over the commonly employed regression-based stem taper modeling techniques, as SDE models have both simple analytic forms and a high level of accuracy. We perform fixed- and mixed-effect parameters estimation for the stem taper models by developing an approximated maximum likelihood procedure and using a data set of longitudinal measurements from 319 mountain pine trees. The symmetric Vasicek- and asymmetric Gompertz-type diffusion processes used adequately describe stem taper evolution. The proposed SDE stem taper models are compared to four regression stem taper equations and four volume equations. Overall, the best goodness-of-fit statistics are produced by the mixed-effect parameters SDEs stem taper models. All results are obtained in the Maple computer algebra system.

Download Full-text

Predicting In Vitro Neurotoxicity Induced by Nanoparticles Using Machine Learning

International Journal of Molecular Sciences ◽

10.3390/ijms21155280 ◽

2020 ◽

Vol 21 (15) ◽

pp. 5280

Author(s):

Irini Furxhi ◽

Finbarr Murphy

Keyword(s):

Machine Learning ◽

Goodness Of Fit ◽

Information Gain ◽

Sampling Technique ◽

Exposure Dose ◽

Classification Model ◽

Machine Learning Techniques ◽

Data Set ◽

Tissue Specific

The practice of non-testing approaches in nanoparticles hazard assessment is necessary to identify and classify potential risks in a cost effective and timely manner. Machine learning techniques have been applied in the field of nanotoxicology with encouraging results. A neurotoxicity classification model for diverse nanoparticles is presented in this study. A data set created from multiple literature sources consisting of nanoparticles physicochemical properties, exposure conditions and in vitro characteristics is compiled to predict cell viability. Pre-processing techniques were applied such as normalization methods and two supervised instance methods, a synthetic minority over-sampling technique to address biased predictions and production of subsamples via bootstrapping. The classification model was developed using random forest and goodness-of-fit with additional robustness and predictability metrics were used to evaluate the performance. Information gain analysis identified the exposure dose and duration, toxicological assay, cell type, and zeta potential as the five most important attributes to predict neurotoxicity in vitro. This is the first tissue-specific machine learning tool for neurotoxicity prediction caused by nanoparticles in in vitro systems. The model performs better than non-tissue specific models.

Download Full-text

Causal inference for recurrent event data using pseudo-observations

Biostatistics ◽

10.1093/biostatistics/kxaa020 ◽

2020 ◽

Author(s):

Chien-Lin Su ◽

Robert W Platt ◽

Jean-François Plante

Keyword(s):

Goodness Of Fit ◽

Variance Estimation ◽

Recurrent Event ◽

Event Data ◽

Robust Estimators ◽

Recurrent Event Data ◽

Finite Sample ◽

Data Set ◽

Asymptotically Normal ◽

Doubly Robust

Summary Recurrent event data are commonly encountered in observational studies where each subject may experience a particular event repeatedly over time. In this article, we aim to compare cumulative rate functions (CRFs) of two groups when treatment assignment may depend on the unbalanced distribution of confounders. Several estimators based on pseudo-observations are proposed to adjust for the confounding effects, namely inverse probability of treatment weighting estimator, regression model-based estimators, and doubly robust estimators. The proposed marginal regression estimator and doubly robust estimators based on pseudo-observations are shown to be consistent and asymptotically normal. A bootstrap approach is proposed for the variance estimation of the proposed estimators. Model diagnostic plots of residuals are presented to assess the goodness-of-fit for the proposed regression models. A family of adjusted two-sample pseudo-score tests is proposed to compare two CRFs. Simulation studies are conducted to assess finite sample performance of the proposed method. The proposed technique is demonstrated through an application to a hospital readmission data set.

Download Full-text

Empirical Models of Fish Response to Lake Acidification

Canadian Journal of Fisheries and Aquatic Sciences ◽

10.1139/f87-172 ◽

1987 ◽

Vol 44 (8) ◽

pp. 1432-1442 ◽

Cited By ~ 21

Author(s):

Kenneth H. Reckhow ◽

Robert W. Black ◽

Thomas B. Stockton Jr. ◽

J. David Vogt ◽

Judith G. Wood

Keyword(s):

New York ◽

Water Chemistry ◽

Brook Trout ◽

Yellow Perch ◽

Goodness Of Fit ◽

Lake Trout ◽

Model Misspecification ◽

Lake Acidification ◽

Data Set ◽

The Relationship

A large historical data set from the Adirondack region of New York was compiled to study the relationship between water chemistry variables associated with acid precipitation and the presence/absence of selected fish species. The data set was used to examine simple statistical models for fish presence/absence, as a function of the water chemistry variables, for brook trout (Salvelinus fontinalis), lake trout (Salvelinus namaycush), white sucker (Catostomus commersoni), and yellow perch (Perca flavescens). Of these models, only those for brook trout and lake trout were found to be acceptable based on statistical goodness-of-fit criteria; thus, parameters for models of these two species alone were estimated using maximum likelihood logistic regression. Candidate models for brook trout and lake trout were then examined, with particular consideration for the problems associated with model misspecification, errors-in-variables, and multicollinearity. For each of the two species, a model was recommended that may be used to predict the effect of changes in lake acidification on species presence/absence in lakes in the Adirondack region.

Download Full-text

Risk Factors Associated with VOD with RIC and MAC Hematopoietic Stem Cell Transplant and the Early Estimation of VOD

Blood ◽

10.1182/blood.v112.11.4315.4315 ◽

2008 ◽

Vol 112 (11) ◽

pp. 4315-4315

Author(s):

Shoichi Nagakura ◽

Tetsuyuki Kiyokawa ◽

Michihiro Hidaka ◽

Takahiro Yano ◽

Kazutaka Sunami ◽

...

Keyword(s):

Risk Factors ◽

Goodness Of Fit ◽

Time Course ◽

Laboratory Data ◽

P Value ◽

Cell Transplant ◽

Data Set ◽

Hematopoietic Stem ◽

Post Transplant ◽

Factors Associated

Abstract BACKGROUND: Despite recent increase of reduced intensity conditioning (RIC) transplantation, mortality rates after RIC and myeloabrative conditioning (MAC) HSCT remain high and hepatic veno-occlusive disease (VOD) cannot accurately predicted. OBJECTIVE: To determine the value of risk factors associated with the development of VOD after allergenic HSCT with RIC and MAC. Estimating VOD based on clinical factors may further improve results of allogenic HSCT. PATIENTS AND METHODS: A retrospective review of 415 consecutive allogenic HSCT was performed with attention to VOD, pre-transplant factors and laboratory data in five hematopoietic cell transplantation centers between 2000 and 2005. Patients underwent transplantation with MAC (n=247) or RIC (n=168). Main outcomes and risk factors were analyzed in multivariable analyses (a logistic regression model) with RIC and MAC. Three kind of laboratory data set, pre-transplant (day −10), post-transplant (day 20) and differences from pre-transplant to post-transplantation were analyzed. RESULTS: VOD occurred in 65 of 415(15.7%) transplant recipients; 40 of 247(16.1%) with MAC and 25 of 168(14.9%) with RIC. Multivariate analyses identified risk factors with the development of VOD with MAC (albumin level, creatinine level) and with RIC (HCT-CI, number of prior chemotherapy regimen, ALT) in pre-transplant laboratory data set. The risk factors of VOD were identified in post-transplant and differences (Table). The Akaike’s information criterion (AIC) of risk factors with differences was better than with the post-transplant. CONCLUSION: Our results provided risk factors of VOD with MAC and RIC. The estimation of VOD before transplantation may be useful for the selection of conditioning regimens. Differences of laboratory data with the time course of transplant may be useful for the early diagnosis of VOD. MAC Pre-transpant data Post-transplant data Differences data OR P-Value OR P-Value OR P-Value Age - - 0.945 0.0090 - - Alb 0.290 0.0125 - - - - Cr 10.204 0.0307 1.786 0.0039 1.984 0.0139 TPro - - 0 358 0.0019 - - TBi I - - 1.385 0.0027 1.314 0.0037 Ara-C - - 5.000 0.0139 goodness of fit AIC 106.727 126.499 86.931 RIC Pre-transpant data Post-transplant data Differences data OR P-Value OR P-Value OR P-Value Sex - - 3.401 0.0446 - - HCTCI 3.922 0.0050 2.000 0.0123 - - ImpScore 2.000 0.0314 - - - - TPro - - 0.366 0.0091 - - TBi I - - 1.675 0.0042 2.273 0.0004 ALT 0.969 0.0432 - - - - CY - - - - 5.682 0.0447 goodness of fit AIC 61.552 91.09 52.808

Download Full-text

Artificial intelligence (AI) to improve patient outcomes in community oncology practices.

Journal of Clinical Oncology ◽

10.1200/jco.2019.37.15_suppl.e18098 ◽

2019 ◽

Vol 37 (15_suppl) ◽

pp. e18098-e18098

Author(s):

John Frownfelter ◽

Sibel Blau ◽

Ray D. Page ◽

John Showalter ◽

Kelly Miller ◽

...

Keyword(s):

Artificial Intelligence ◽

Goodness Of Fit ◽

Predictive Analytics ◽

Training Data ◽

Model Specification ◽

Care Plan ◽

Cancer Type ◽

Pilot Program ◽

Data Set ◽

Ecog Ps

e18098 Background: Artificial Intelligence(AI) for predictive analytics has been studied extensively in diagnostic imaging and genetic testing. Cognitive analytics adds by suggesting interventions that optimize health outcomes using real-time data and machine learning. Herein, we report the results of a pilot study of the Jvion, Inc. Cognitive Clinical Success Machine (CCSM), an eigen vector-based deep learning AI technology. Methods: The CCSM uses electronic medical record (EMR) and publicly available socioeconomic/behavioral databases to create a n-dimensional space within which patients are mapped along vectors resulting in thousands of relevant clusters of clinically/behaviorally similar patients. These clusters have a mathematical propensity to respond to a clinical intervention which are updated dynamically with new data from the site. The CCSM generates recommendations for the provider to consider as they develop a care plan based on the patients’ cluster. We tested and trained the CCSM technology at 3 US oncology practices for the risk (low, intermediate, high) of 4 specific outcomes: 30 day severe pain, 30 day mortality, 6 month clinical deterioration (ECOG-PS), and 6 month diagnosis of major depressive disorder (MDD). We report the accuracy of the CCSM based on the testing and training data sets. Area under the curve (AUC) was calculated to show goodness of fit of classification models for each outcome. Results: In the training/testing data set there were 371,787 patients from the 3 sites: female = 61.3%; age ≤ 50 = 21.3%, 51-65 = 26.9%, > 65 = 51.9%; white/Caucasian = 43.4%, black/African American = 5.9%, unknown race = 43.4%. Cancer types were unknown/missing for 66.3% of patients and stage for 90.4% of patients. AUC range per vector: 30 day severe/recurrent pain = 0.85-0.90; 30-day mortality = 0.86-0.97; 6-month ECOG-PS decline of 1 point = 0.88-0.92; and 6-month diagnosis of MDD = 0.77-0.90. Conclusions: The high AUC indicates good separation between true positives/negatives (proper model specification for classifying the risk of each outcome) regardless of the degree of missing data for variables including cancer type and stage. Following testing, a 6 month pilot program was implemented (06/2018-11/2018). Final results of the pilot program are pending.

Download Full-text