The Normal Distribution And Data Transformation

2019 ◽  
Vol 412 (5) ◽  
pp. 1129-1136 ◽  
Author(s):  
Wim Broothaerts ◽  
Fernando Cordeiro ◽  
Philippe Corbisier ◽  
Piotr Robouch ◽  
Hendrik Emons

AbstractThe outcome of proficiency tests (PTs) is influenced, among others, by the evaluation procedure chosen by the PT provider. In particular for PTs on GMO testing a log-data transformation is often applied to fit skewed data distributions into a normal distribution. The study presented here has challenged this commonly applied approach. The 56 data populations from proficiency testing rounds organised since 2010 by the European Union Reference Laboratory for Genetically Modified Food and Feed (EURL GMFF) were used to investigate the assumption of a normal distribution of reported results within a PT. Statistical evaluation of the data distributions, composed of 3178 reported results, revealed that 41 of the 56 datasets showed indeed a normal distribution. For 10 datasets, the deviation from normality was not statistically significant at the raw or log scale, indicating that the normality assumption cannot be rejected. The normality of the five remaining datasets was statistically significant after log-data transformation. These datasets, however, appeared to be multimodal as a result of technical/experimental issues with the applied methods. On the basis of the real datasets analysed herein, it is concluded that the log transformation of reported data in proficiency testing rounds is often not necessary and should be cautiously applied. It is further shown that the log-data transformation, when applied to PT results, favours the positive performance scoring for overestimated results and strongly penalises underestimated results. The evaluation of the participants’ performance without prior transformation of their results may highlight rather than hide relevant underlying analytical problems and is recommended as an outcome of this study.


Metals ◽  
2019 ◽  
Vol 9 (5) ◽  
pp. 493 ◽  
Author(s):  
Dongdong You ◽  
Xiaocheng Shen ◽  
Yanghui Zhu ◽  
Jianxin Deng ◽  
Fenglei Li

A Bayesian framework-based approach is proposed for the quantitative validation and calibration of the kriging metamodel established by simulation and experimental training samples of the injection mechanism in squeeze casting. The temperature data uncertainty and non-normal distribution are considered in the approach. The normality of the sample data is tested by the Anderson–Darling method. The test results show that the original difference data require transformation for Bayesian testing due to the non-normal distribution. The Box–Cox method is employed for the non-normal transformation. The hypothesis test results of the calibrated kriging model are more reliable after data transformation. The reliability of the kriging metamodel is quantitatively assessed by the calculated Bayes factor and confidence. The Bayesian factor and the confidence level results indicate that the kriging model demonstrates improved accuracy and is acceptable after data transformation. The influence of the threshold ε on both the non-normally and normally distributed data in the model is quantitatively evaluated. The threshold ε has a greater influence and higher sensitivity when applied to the normal data results, based on the rapid increase within a small range of the Bayes factors and confidence levels.


2019 ◽  
Vol 50 (5) ◽  
pp. 1267-1280
Author(s):  
Wei Xu ◽  
Xiaoying Fu ◽  
Xia Li ◽  
Ming Wang

Abstract This paper presents a new Bayesian probabilistic forecast (BPF) model to improve the efficiency and reliability of normal distribution transformation and to describe the uncertainties of medium-range forecasting inflows with 10 days forecast horizons. In this model, the inflow data will be transformed twice to a standard normal distribution. The Box–Cox (BC) model is first used to quickly transform the inflow data with a normal distribution, and then, the transformed data are converted to a standard normal distribution by the meta-Gaussian (MG) model. Based on the transformed inflows in the standard normal distribution, the prior and likelihood density functions of the BPF are established, respectively. In this study, the newly developed model is tested on China's Huanren hydropower reservoir and is compared with BPFs using MG and BC, separately. Comparative results show that the new BPF model exhibits significantly improved data transformation efficiency and forecast accuracy.


1985 ◽  
Vol 24 (03) ◽  
pp. 120-130 ◽  
Author(s):  
E. Brunner ◽  
N. Neumann

SummaryThe mathematical basis of Zelen’s suggestion [4] of pre randomizing patients in a clinical trial and then asking them for their consent is investigated. The first problem is to estimate the therapy and selection effects. In the simple prerandomized design (PRD) this is possible without any problems. Similar observations have been made by Anbar [1] and McHugh [3]. However, for the double PRD additional assumptions are needed in order to render therapy and selection effects estimable. The second problem is to determine the distribution of the statistics. It has to be taken into consideration that the sample sizes are random variables in the PRDs. This is why the distribution of the statistics can only be determined asymptotically, even under the assumption of normal distribution. The behaviour of the statistics for small samples is investigated by means of simulations, where the statistics considered in the present paper are compared with the statistics suggested by Ihm [2]. It turns out that the statistics suggested in [2] may lead to anticonservative decisions, whereas the “canonical statistics” suggested by Zelen [4] and considered in the present paper keep the level quite well or may lead to slightly conservative decisions, if there are considerable selection effects.


1963 ◽  
Vol 09 (02) ◽  
pp. 472-474 ◽  
Author(s):  
W Dick ◽  
W Schneider ◽  
K Brockmüller ◽  
W Mayer

SummaryA comparison between the repartition of the blood groups in 461 patients suffering from thromboembolic disorders and the normal distribution has shown a statistically ascertained predominance of the group A1. On the other hand the blood groups 0 and A2 are distinctly less frequent than in the normal distribution.


2019 ◽  
Vol 10 (2) ◽  
pp. 117-125
Author(s):  
Dana Kubíčková ◽  
◽  
Vladimír Nulíček ◽  

The aim of the research project solved at the University of Finance and administration is to construct a new bankruptcy model. The intention is to use data of the firms that have to cease their activities due to bankruptcy. The most common method for bankruptcy model construction is multivariate discriminant analyses (MDA). It allows to derive the indicators most sensitive to the future companies’ failure as a parts of the bankruptcy model. One of the assumptions for using the MDA method and reassuring the reliable results is the normal distribution and independence of the input data. The results of verification of this assumption as the third stage of the project are presented in this article. We have revealed that this assumption is met only in a few selected indicators. Better results were achieved in the indicators in the set of prosperous companies and one year prior the failure. The selected indicators intended for the bankruptcy model construction thus cannot be considered as suitable for using the MDA method.


2015 ◽  
Vol 47 (8) ◽  
pp. 24-40 ◽  
Author(s):  
Telman Abbas ogly Aliev ◽  
Naila F. Musaeva ◽  
Matanat Tair kyzy Suleymanova ◽  
Bahruz Ismail ogly Gazizade

2016 ◽  
Vol 48 (4) ◽  
pp. 39-55 ◽  
Author(s):  
Telman Abbas ogly Aliev ◽  
Naila Fuad kyzy Musaeva ◽  
Matanat Tair kyzy Suleymanova ◽  
Bahruz Ismail ogly Gazizade

Sign in / Sign up

Export Citation Format

Share Document