Test-Adequacy and Statistical Testing: Combining Different Properties of a Test-Set

The statistical testing technique is considered to compare the metrics values of machine learning models on a test set. Since the values of metrics depend not only on the models, but also on the data, it may turn out that different models are the best on different test sets. For this reason, the traditional approach to comparing the values of metrics on a test set is often not enough. Sometimes a statistical comparison of the results obtained on the basis of cross-validation is used, but in this case it is impossible to guarantee the independence of the obtained measurements, which does not allow the use of the Student's t-test. There are criteria that do not require independent measurements, but they have less power. For additive metrics, a technique is proposed in this paper, when a test sample is divided into N parts, on each of which the values of the metrics are calculated. Since the value on each part is obtained as the sum of independent random variables, according to the central limit theorem, the obtained metrics values on each of the N parts are realizations of the normally distributed random variable. To estimate the required sample size, it is proposed to use normality tests and build quantile– quantile plots. You can then use a modification of the Student's t-test to conduct a statistical test comparing the mean values of the metrics. A simplified approach is also considered, in which confidence intervals are built for the base model. A model whose metric values do not fall into this interval works differently from the base model. This approach reduces the amount of computations needed, however, an experimental analysis of the binary cross-entropy metric for CTR (Click-Through Rate) prediction models showed that it is more rough than the first one.

Download Full-text

Revisiting the relationship between fault detection, test adequacy criteria, and test set size

Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering ◽

10.1145/3324884.3416667 ◽

2020 ◽

Author(s):

Yiqun T. Chen ◽

Rahul Gopinath ◽

Anita Tadakamalla ◽

Michael D. Ernst ◽

Reid Holmes ◽

...

Keyword(s):

Fault Detection ◽

Test Set ◽

Set Size ◽

The Relationship ◽

Test Adequacy

Download Full-text

Using Connectionist Modules for Decision Support

Methods of Information in Medicine ◽

10.1055/s-0038-1634790 ◽

1990 ◽

Vol 29 (03) ◽

pp. 167-181 ◽

Cited By ~ 6

Author(s):

G. Hripcsak

Keyword(s):

Decision Support ◽

Standard Deviation ◽

Confidence Interval ◽

Posterior Probability ◽

Back Propagation ◽

Connectionist Model ◽

Test Set ◽

The Third ◽

Independent Test ◽

Better Than

AbstractA connectionist model for decision support was constructed out of several back-propagation modules. Manifestations serve as input to the model; they may be real-valued, and the confidence in their measurement may be specified. The model produces as its output the posterior probability of disease. The model was trained on 1,000 cases taken from a simulated underlying population with three conditionally independent manifestations. The first manifestation had a linear relationship between value and posterior probability of disease, the second had a stepped relationship, and the third was normally distributed. An independent test set of 30,000 cases showed that the model was better able to estimate the posterior probability of disease (the standard deviation of residuals was 0.046, with a 95% confidence interval of 0.046-0.047) than a model constructed using logistic regression (with a standard deviation of residuals of 0.062, with a 95% confidence interval of 0.062-0.063). The model fitted the normal and stepped manifestations better than the linear one. It accommodated intermediate levels of confidence well.

Download Full-text

Hubungan Religiusitas dengan Citra Tubuh pada Wanita Dewasa Awal

Jurnal Psikologi Islam dan Budaya ◽

10.15575/jpib.v1i1.2076 ◽

2018 ◽

Vol 1 (1) ◽

pp. 9-28

Author(s):

Dessy Sumanty ◽

Deden Sudirman ◽

Diah Puspasari

Keyword(s):

Body Image ◽

Correlation Coefficient ◽

Research Design ◽

Calculation Result ◽

The Body ◽

Statistical Testing ◽

P Value ◽

Significance Level ◽

Correlational Research

This research attempts to relate the body image phenomenon with the level of subject religiosity. This research used correlational research design that was involving 332 respondents. The statistical testing which is used to test the hypothesis Rank Spearman. The calculation result with the significance level of trust 95% (a = 0.05) show that the correlation coefficient is 0.083 and p-value is 0.129. It means that Ho is accepted and H1 is rejected. It can be concluded that there is no relationship between religiosity with body image.

Download Full-text

A Statistical Testing Method for Accurate Assessment of Packet Loss Probability

IEICE Transactions on Communications ◽

10.1587/transcom.e95.b.2968 ◽

2012 ◽

Vol E95.B (9) ◽

pp. 2968-2971

Author(s):

Iksoon HWANG ◽

Jaesung PARK

Keyword(s):

Packet Loss ◽

Loss Probability ◽

Statistical Testing ◽

Accurate Assessment ◽

Packet Loss Probability ◽

Testing Method ◽

Statistical Testing Method

Download Full-text

RetroBioCat: Computer-Aided Synthesis Planning for Biocatalytic Reactions and Cascades

10.26434/chemrxiv.12571235.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

William Finnigan ◽

Lorna J. Hepworth ◽

Nicholas J. Turner ◽

Sabine Flitsch

Keyword(s):

Organic Chemistry ◽

Computer Aided Design ◽

Selective Synthesis ◽

Test Set ◽

Synthesis Design ◽

Computer Aided ◽

Synthesis Planning ◽

Enzymatic Cascades ◽

Target Molecules ◽

Aided Design

As the enzyme toolbox for biocatalysis has expanded, so has the potential for the construction of powerful enzymatic cascades for efficient and selective synthesis of target molecules. Additionally, recent advances in computer-aided synthesis planning (CASP) are revolutionizing synthesis design in both synthetic biology and organic chemistry. However, the potential for biocatalysis is not well captured by tools currently available in either field. Here we present RetroBioCat, an intuitive and accessible tool for computer-aided design of biocatalytic cascades, freely available at retrobiocat.com. Our approach uses a set of expertly encoded reaction rules encompassing the enzyme toolbox for biocatalysis, and a system for identifying literature precedent for enzymes with the correct substrate specificity where this is available. Applying these rules for automated biocatalytic retrosynthesis, we show our tool to be capable of identifying promising biocatalytic pathways to target molecules, validated using a test-set of recent cascades described in the literature.

Download Full-text

PREDIKSI KUALITAS AIR SUNGAI CILIWUNG DENGAN MENGGUNAKAN ALGORITMA POHON KEPUTUSAN

Jurnal Air Indonesia ◽

10.29122/jai.v12i2.4364 ◽

2021 ◽

Vol 12 (2) ◽

Author(s):

Mohammad Haekal ◽

Henki Bayu Seta ◽

Mayanda Mega Santoni

Keyword(s):

Data Mining ◽

Decision Tree ◽

Cross Validation ◽

Online Monitoring ◽

Training Set ◽

Microsoft Excel ◽

Test Set

Untuk memprediksi kualitas air sungai Ciliwung, telah dilakukan pengolahan data-data hasil pemantauan secara Online Monitoring dengan menggunakan Metode Data Mining. Pada metode ini, pertama-tama data-data hasil pemantauan dibuat dalam bentuk tabel Microsoft Excel, kemudian diolah menjadi bentuk Pohon Keputusan yang disebut Algoritma Pohon Keputusan (Decision Tree) mengunakan aplikasi WEKA. Metode Pohon Keputusan dipilih karena lebih sederhana, mudah dipahami dan mempunyai tingkat akurasi yang sangat tinggi. Jumlah data hasil pemantauan kualitas air sungai Ciliwung yang diolah sebanyak 5.476 data. Hasil klarifikasi dengan Pohon Keputusan, dari 5.476 data ini diperoleh jumlah data yang mengindikasikan sungai Ciliwung Tidak Tercemar sebanyak 1.059 data atau sebesar 19,3242%, dan yang mengindikasikan Tercemar sebanyak 4.417 data atau 80,6758%. Selanjutnya data-data hasil pemantauan ini dievaluasi menggunakan 4 Opsi Tes (Test Option) yaitu dengan Use Training Set, Supplied Test Set, Cross-Validation folds 10, dan Percentage Split 66%. Hasil evaluasi dengan 4 opsi tes yang digunakan ini, semuanya menunjukkan tingkat akurasi yang sangat tinggi, yaitu diatas 99%. Dari data-data hasil peneltian ini dapat diprediksi bahwa sungai Ciliwung terindikasi sebagai sungai tercemar bila mereferensi kepada Peraturan Pemerintah Republik Indonesia nomor 82 tahun 2001 dan diketahui pula bahwa penggunaan aplikasi WEKA dengan Algoritma Pohon Keputusan untuk mengolah data-data hasil pemantauan dengan mengambil tiga parameter (pH, DO dan Nitrat) adalah sangat akuran dan tepat. Kata Kunci : Kualitas air sungai, Data Mining, Algoritma Pohon Keputusan, Aplikasi WEKA.

Download Full-text