How well does your model fit the data?

The conserved nature of sleep in Drosophila has allowed the fruit fly to emerge in the last decade as a powerful model organism in which to study sleep. Recent sleep studies in Drosophila have focused on the discovery and characterization of hyposomnolent mutants. One common feature of these animals is a change in sleep architecture: sleep bout count tends to be greater, and sleep bout length lower, in hyposomnolent mutants. I propose a mathematical model, produced by least-squares nonlinear regression to fit the form Y = aX^b, which can explain sleep behavior in the healthy animal as well as previously-reported changes in total sleep and sleep architecture in hyposomnolent mutants. This model, fit to sleep data, yields coefficient of determination R squared, which describes goodness of fit. R squared is lower in hyposomnolent mutant insomniac as compared to control, indicating a poorer fit of the model to the data in insomniac. R squared also tends to be lower in daytime sleep as compared to nighttime sleep. My findings raise the possibility that low R squared is a feature of all hyposomnolent mutants, not just insomniac. If this were the case, R squared could emerge as a novel means by which sleep researchers might assess sleep dysfunction.

Download Full-text

Goodness of fit to a mathematical model for Drosophila sleep behavior is reduced in hyposomnolent mutants

PeerJ ◽

10.7717/peerj.1533 ◽

2016 ◽

Vol 4 ◽

pp. e1533 ◽

Cited By ~ 2

Author(s):

Joshua M. Diamond

Keyword(s):

Mathematical Model ◽

Goodness Of Fit ◽

Model Organism ◽

Fruit Fly ◽

Model Fit ◽

Sleep Architecture ◽

Coefficient Of Determination ◽

Sleep Behavior ◽

Sleep Studies

The conserved nature of sleep in Drosophila has allowed the fruit fly to emerge in the last decade as a powerful model organism in which to study sleep. Recent sleep studies in Drosophila have focused on the discovery and characterization of hyposomnolent mutants. One common feature of these animals is a change in sleep architecture: sleep bout count tends to be greater, and sleep bout length lower, in hyposomnolent mutants. I propose a mathematical model, produced by least-squares nonlinear regression to fit the formY=aX∧b, which can explain sleep behavior in the healthy animal as well as previously-reported changes in total sleep and sleep architecture in hyposomnolent mutants. This model, fit to sleep data, yields coefficient of determinationRsquared, which describes goodness of fit.Rsquared is lower, as compared to control, in hyposomnolent mutantsinsomniacandfumin. My findings raise the possibility that lowRsquared is a feature of all hyposomnolent mutants, not justinsomniacandfumin. If this were the case,Rsquared could emerge as a novel means by which sleep researchers might assess sleep dysfunction.

Download Full-text

From R-squared to coefficient of model accuracy for assessing "goodness-of-fits"

10.5194/gmd-2020-51 ◽

2020 ◽

Cited By ~ 1

Author(s):

Charles Onyutha

Keyword(s):

Goodness Of Fit ◽

Classical Model ◽

Model Performance ◽

Model Fit ◽

Coefficient Of Determination ◽

Hydrological Models ◽

Objective Functions ◽

Model Accuracy ◽

Sample Correlation ◽

Model Residuals

Abstract. Modelers tend to focus more on advancing methods of statistical and mathematical modeling than developing novel techniques for comparing modeled results with observations or establishing metrics for model performance assessment. Perhaps solely the most extensively applied "goodness-of-fit" measure especially for assessing performance of regression models is the coefficient of determination R2. Normally, high R2 tends to be associated with an efficient model. Nevertheless, R2 has been cited to have no importance in the classical model of regression. Even in its use in descriptive statistics, R2 is known to have questionable justification. R2 is inadequate in assessing model performance because it does not give any information on the model residuals. Furthermore, R-squared can be low for an effective model. Contrastingly, a very poor model fit can yield high R2. Regressing X on Y yields R2 which is the same as that if Y is regressed on X thereby invalidating its use as a coefficient of determination. Taking into account the drawbacks of using R2, this paper introduces coefficient of model accuracy (CMA) the derivation of which comprises an analogy to the R2. However, instead of simply squaring an ordinary Pearson's product-moment correlation coefficient to obtain R2, CMA comprises the product of nonparametric sample correlation and model bias. Acceptability of the introduced method can be found demonstrated through comparison of results from simulations by hydrological models calibrated using CMA and other existing objective functions.

Download Full-text

A mathematical model describes Drosophila sleep behavior in w1118 controls and in a hyposomnolent insomniac line

10.7287/peerj.preprints.1408 ◽

2015 ◽

Author(s):

Joshua M Diamond

Keyword(s):

Mathematical Model ◽

Goodness Of Fit ◽

Model Organism ◽

Fruit Fly ◽

Model Fit ◽

Sleep Architecture ◽

Coefficient Of Determination ◽

Sleep Behavior ◽

Nighttime Sleep

The conserved nature of sleep in Drosophila has allowed the fruit fly to emerge in the last decade as a powerful model organism in which to study sleep. Recent sleep studies in Drosophila have focused on the discovery and characterization of hyposomnolent mutants. One common feature of these animals is a change in sleep architecture: sleep bout count tends to be greater, and sleep bout length lower, in hyposomnolent mutants. I propose a mathematical model, produced by least-squares nonlinear regression to fit the form Y = aX^b, which can explain sleep behavior in the healthy animal as well as previously-reported changes in total sleep and sleep architecture in hyposomnolent mutants. This model, fit to sleep data, yields coefficient of determination R squared, which describes goodness of fit. R squared is lower in hyposomnolent mutant insomniac as compared to control, indicating a poorer fit of the model to the data in insomniac. R squared also tends to be lower in daytime sleep as compared to nighttime sleep. My findings raise the possibility that low R squared is a feature of all hyposomnolent mutants, not just insomniac. If this were the case, R squared could emerge as a novel means by which sleep researchers might assess sleep dysfunction.

Download Full-text

Simple Index to Assess the Calibration Quality of Safety Performance Functions Based on Multiple Goodness-of-Fit Metrics

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211008896 ◽

2021 ◽

pp. 036119812110088

Author(s):

Raul E. Avelar ◽

Karen Dixon ◽

Boniphace Kutela ◽

Sam Klump ◽

Beth Wemple ◽

...

Keyword(s):

Goodness Of Fit ◽

Synthetic Data ◽

Calibration Procedure ◽

Safety Performance ◽

Absolute Deviation ◽

Data Set ◽

Safety Database ◽

Simple Index ◽

Safety Performance Functions

The calibration of safety performance functions (SPFs) is a mechanism included in the Highway Safety Manual (HSM) to adjust SPFs in the HSM for use in intended jurisdictions. Critically, the quality of the calibration procedure must be assessed before using the calibrated SPFs. Multiple resources to aid practitioners in calibrating SPFs have been developed in the years following the publication of the HSM 1st edition. Similarly, the literature suggests multiple ways to assess the goodness-of-fit (GOF) of a calibrated SPF to a data set from a given jurisdiction. This paper uses the calibration results of multiple intersection SPFs to a large Mississippi safety database to examine the relations between multiple GOF metrics. The goal is to develop a sensible single index that leverages the joint information from multiple GOF metrics to assess overall quality of calibration. A factor analysis applied to the calibration results revealed three underlying factors explaining 76% of the variability in the data. From these results, the authors developed an index and performed a sensitivity analysis. The key metrics were found to be, in descending order: the deviation of the cumulative residual (CURE) plot from the 95% confidence area, the mean absolute deviation, the modified R-squared, and the value of the calibration factor. This paper also presents comparisons between the index and alternative scoring strategies, as well as an effort to verify the results using synthetic data. The developed index is recommended to comprehensively assess the quality of the calibrated intersection SPFs.

Download Full-text

Evaluation of NASA POWER Reanalysis Products to Estimate Daily Weather Variables in a Hot Summer Mediterranean Climate

Agronomy ◽

10.3390/agronomy11061207 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1207

Author(s):

Gonçalo C. Rodrigues ◽

Ricardo P. Braga

Keyword(s):

Wind Speed ◽

Bias Correction ◽

Goodness Of Fit ◽

Mediterranean Climate ◽

Weather Data ◽

Coefficient Of Determination ◽

Mean Bias Error ◽

Bias Error ◽

Weather Variables ◽

Weather Stations

This study aims to evaluate NASA POWER reanalysis products for daily surface maximum (Tmax) and minimum (Tmin) temperatures, solar radiation (Rs), relative humidity (RH) and wind speed (Ws) when compared with observed data from 14 distributed weather stations across Alentejo Region, Southern Portugal, with a hot summer Mediterranean climate. Results showed that there is good agreement between NASA POWER reanalysis and observed data for all parameters, except for wind speed, with coefficient of determination (R2) higher than 0.82, with normalized root mean square error (NRMSE) varying, from 8 to 20%, and a normalized mean bias error (NMBE) ranging from –9 to 26%, for those variables. Based on these results, and in order to improve the accuracy of the NASA POWER dataset, two bias corrections were performed to all weather variables: one for the Alentejo Region as a whole; another, for each location individually. Results improved significantly, especially when a local bias correction is performed, with Tmax and Tmin presenting an improvement of the mean NRMSE of 6.6 °C (from 8.0 °C) and 16.1 °C (from 20.5 °C), respectively, while a mean NMBE decreased from 10.65 to 0.2%. Rs results also show a very high goodness of fit with a mean NRMSE of 11.2% and mean NMBE equal to 0.1%. Additionally, bias corrected RH data performed acceptably with an NRMSE lower than 12.1% and an NMBE below 2.1%. However, even when a bias correction is performed, Ws lacks the performance showed by the remaining weather variables, with an NRMSE never lower than 19.6%. Results show that NASA POWER can be useful for the generation of weather data sets where ground weather stations data is of missing or unavailable.

Download Full-text

Predicting effects of selected impregnation processes on the observed bending strength of wood, with use of data mining models

BioResources ◽

10.15376/biores.16.3.4891-4904 ◽

2021 ◽

Vol 16 (3) ◽

pp. 4891-4904

Author(s):

Selahattin Bardak ◽

Timucin Bardak ◽

Hüseyin Peker ◽

Eser Sözen ◽

Yildiz Çabuk

Keyword(s):

Data Mining ◽

Goodness Of Fit ◽

Bending Strength ◽

Wood Preservation ◽

Ambient Conditions ◽

Data Mining Algorithms ◽

Use Of Data ◽

The Cost ◽

Mining Algorithms ◽

The Relationship

Wood materials have been used in many products such as furniture, stairs, windows, and doors for centuries. There are differences in methods used to adapt wood to ambient conditions. Impregnation is a widely used method of wood preservation. In terms of efficiency, it is critical to optimize the parameters for impregnation. Data mining techniques reduce most of the cost and operational challenges with accurate prediction in the wood industry. In this study, three data-mining algorithms were applied to predict bending strength in impregnated wood materials (Pinus sylvestris L. and Millettia laurentii). Models were created from real experimental data to examine the relationship between bending strength, diffusion time, vacuum duration, and wood type, based on decision trees (DT), random forest (RF), and Gaussian process (GP) algorithms. The highest bending strength was achieved with wenge (Millettia laurentii) wood in 10 bar vacuum and the diffusion condition during 25 min. The results showed that all algorithms are suitable for predicting bending strength. The goodness of fit for the testing phase was determined as 0.994, 0.986, and 0.989 in the DT, RF, and GP algorithms, respectively. Moreover, the importance of attributes was determined in the algorithms.

Download Full-text

Subsite mapping of enzymes. Depolymerase computer modelling

Biochemical Journal ◽

10.1042/bj1590105 ◽

1976 ◽

Vol 159 (1) ◽

pp. 105-120 ◽

Cited By ~ 36

Author(s):

J D Allen ◽

J A Thoma

Keyword(s):

Goodness Of Fit ◽

Computer Modelling ◽

Bond Cleavage ◽

Calculated Data ◽

Simulated Data ◽

Binding Energies ◽

Rate Coefficients ◽

Minimization Procedure ◽

Squared Residuals ◽

Subsite Mapping

We have developed a depolymerase computer model that uses a minimization routine. The model is designed so that, given experimental bond-cleavage frequencies for oligomeric substrates and experimental Michaelis parameters as a function of substrate chain length, the optimum subsite map is generated. The minimized sum of the weighted-squared residuals of the experimental and calculated data is used as a criterion of the goodness-of-fit for the optimized subsite map. The application of the minimization procedure to subsite mapping is explored through the use of simulated data. A procedure is developed whereby the minimization model can be used to determine the number of subsites in the enzymic binding region and to locate the position of the catalytic amino acids among these subsites. The degree of propagation of experimental variance into the subsite-binding energies is estimated. The question of whether hydrolytic rate coefficients are constant or a function of the number of filled subsites is examined.

Download Full-text

A Revised Model of Trust in Internet-Based Health Information and Advice: Cross-Sectional Questionnaire Study

Journal of Medical Internet Research ◽

10.2196/11125 ◽

2019 ◽

Vol 21 (11) ◽

pp. e11125

Author(s):

Elizabeth Sillence ◽

John Matthew Blythe ◽

Pam Briggs ◽

Mark Moss

Keyword(s):

Goodness Of Fit ◽

Online Survey ◽

Patient Experiences ◽

Model Fit ◽

Key Factors ◽

The United Kingdom ◽

Personal Experiences ◽

And Coping ◽

Fit Index ◽

Significant Direct Effect

Background The internet continues to offer new forms of support for health decision making. Government, charity, and commercial websites increasingly offer a platform for shared personal health experiences, and these are just some of the opportunities that have arisen in a largely unregulated arena. Understanding how people trust and act on this information has always been an important issue and remains so, particularly as the design practices of health websites continue to evolve and raise further concerns regarding their trustworthiness. Objective The aim of this study was to identify the key factors influencing US and UK citizens’ trust and intention to act on advice found on health websites and to understand the role of patient experiences. Methods A total of 1123 users took part in an online survey (625 from the United States and 498 from the United Kingdom). They were asked to recall their previous visit to a health website. The online survey consisted of an updated general Web trust questionnaire to account for personal experiences plus questions assessing key factors associated with trust in health websites (information corroboration and coping perception) and intention to act. We performed principal component analysis (PCA), then explored the relationship between the factor structure and outcomes by testing the fit to the sampled data using structural equation modeling (SEM). We also explored the model fit across US and UK populations. Results PCA of the general Web trust questionnaire revealed 4 trust factors: (1) personal experiences, (2) credibility and impartiality, (3) privacy, and (4) familiarity. In the final SEM model, trust was found to have a significant direct effect on intention to act (beta=.59; P<.001), and of the trust factors, only credibility and impartiality had a significant direct effect on trust (beta=.79; P<.001). The impact of personal experiences on trust was mediated through information corroboration (beta=.06; P=.04). Variables specific to electronic health (eHealth; information corroboration and coping) were found to substantially improve the model fit, and differences in information corroboration were found between US and UK samples. The final model accounting for all factors achieved a good fit (goodness-of-fit index [0.95], adjusted goodness-of-fit index [0.93], root mean square error of approximation [0.50], and comparative fit index [0.98]) and explained 65% of the variance in trust and 41% of the variance in intention to act. Conclusions Credibility and impartiality continue to be key predictors of trust in eHealth websites. Websites with patient experiences can positively influence trust but only if users first corroborate the information through other sources. The need for corroboration was weaker in the United Kingdom, where website familiarity reduced the need to check information elsewhere. These findings are discussed in relation to existing trust models, patient experiences, and health literacy.

Download Full-text

Present State-of-The-Art of Association Rule Mining Algorithms

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2202.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6398-6405

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

State Of The Art ◽

Synthetic Data ◽

Data Sets ◽

Evolutionary Analysis ◽

Rule Mining ◽

Transaction Database ◽

Mining Algorithms

A Data mining is the method of extracting useful information from various repositories such as Relational Database, Transaction database, spatial database, Temporal and Time-series database, Data Warehouses, World Wide Web. Various functionalities of Data mining include Characterization and Discrimination, Classification and prediction, Association Rule Mining, Cluster analysis, Evolutionary analysis. Association Rule mining is one of the most important techniques of Data Mining, that aims at extracting interesting relationships within the data. In this paper we study various Association Rule mining algorithms, also compare them by using synthetic data sets, and we provide the results obtained from the experimental analysis

Download Full-text