set splitting Latest Research Papers

Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results

PLoS ONE ◽

10.1371/journal.pone.0256152 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0256152

Author(s):

Chansik An ◽

Yae Won Park ◽

Sung Soo Ahn ◽

Kyunghwa Han ◽

Hwiyoung Kim ◽

...

Keyword(s):

Machine Learning ◽

Test Performance ◽

Small Sample Size ◽

Area Under The Curve ◽

Small Sample ◽

Simple Task ◽

Test Set ◽

Validation Methods ◽

Small Sample Sizes ◽

Set Splitting

This study aims to determine how randomly splitting a dataset into training and test sets affects the estimated performance of a machine learning model and its gap from the test performance under different conditions, using real-world brain tumor radiomics data. We conducted two classification tasks of different difficulty levels with magnetic resonance imaging (MRI) radiomics features: (1) “Simple” task, glioblastomas [n = 109] vs. brain metastasis [n = 58] and (2) “difficult” task, low- [n = 163] vs. high-grade [n = 95] meningiomas. Additionally, two undersampled datasets were created by randomly sampling 50% from these datasets. We performed random training-test set splitting for each dataset repeatedly to create 1,000 different training-test set pairs. For each dataset pair, the least absolute shrinkage and selection operator model was trained and evaluated using various validation methods in the training set, and tested in the test set, using the area under the curve (AUC) as an evaluation metric. The AUCs in training and testing varied among different training-test set pairs, especially with the undersampled datasets and the difficult task. The mean (±standard deviation) AUC difference between training and testing was 0.039 (±0.032) for the simple task without undersampling and 0.092 (±0.071) for the difficult task with undersampling. In a training-test set pair with the difficult task without undersampling, for example, the AUC was high in training but much lower in testing (0.882 and 0.667, respectively); in another dataset pair with the same task, however, the AUC was low in training but much higher in testing (0.709 and 0.911, respectively). When the AUC discrepancy between training and test, or generalization gap, was large, none of the validation methods helped sufficiently reduce the generalization gap. Our results suggest that machine learning after a single random training-test set split may lead to unreliable results in radiomics studies especially with small sample sizes.

Download Full-text

Radiomics machine learning study with small sample size: single random training-test set split may result in unreliable results

10.21203/rs.3.rs-105766/v2 ◽

2020 ◽

Author(s):

Chansik An ◽

Yae Won Park ◽

Sung Soo Ahn ◽

Kyunghwa Han ◽

Hwiyoung Kim ◽

...

Keyword(s):

Machine Learning ◽

Sample Size ◽

Small Sample Size ◽

Area Under The Curve ◽

Small Sample ◽

Simple Task ◽

Test Set ◽

Magnetic Resonance Imaging Mri ◽

Selection Operator ◽

Set Splitting

Abstract Objective: This study aims to determine how randomly splitting a dataset into training and test sets affects the estimated performance of a machine learning model under different conditions, using real-world brain tumor radiomics data.Materials and Methods: We conducted two classification tasks of different difficulty levels with magnetic resonance imaging (MRI) radiomics features: (1) “Simple” task, glioblastomas [n=109] vs. brain metastasis [n=58] and (2) “difficult” task, low- [n=163] vs. high-grade [n=95] meningiomas. Additionally, two undersampled datasets were created by randomly sampling 50% from these datasets. We performed random training-test set splitting for each dataset repeatedly to create 1,000 different training and test set pairs. For each dataset pair, the least absolute shrinkage and selection operator model was trained by five-fold cross-validation (CV) or nested CV with or without repetitions in the training set and tested with the test set, using the area under the curve (AUC) as an evaluation metric.Results: The AUCs in CV and testing varied widely based on data composition, especially with the undersampled datasets and the difficult task. The mean (±standard deviation) AUC difference between CV and testing was 0.029 (±0.022) for the simple task without undersampling and 0.108 (±0.079) for the difficult task with undersampling. In a training-test set pair, the AUC was high in CV but much lower in testing (0.840 and 0.650, respectively); in another dataset pair with the same task, however, the AUC was low in CV but much higher in testing (0.702 and 0.836, respectively). None of the CV methods helped overcome this issue.Conclusions: Machine learning after a single random training-test set split may lead to unreliable results in radiomics studies, especially when the sample size is small.

Download Full-text

Level set splitting in DEM for modeling breakage mechanics

Computer Methods in Applied Mechanics and Engineering ◽

10.1016/j.cma.2020.112961 ◽

2020 ◽

Vol 365 ◽

pp. 112961

Author(s):

John M. Harmon ◽

Daniel Arthur ◽

José E. Andrade

Keyword(s):

Level Set ◽

Set Splitting ◽

Breakage Mechanics

Download Full-text

Backward-forward reachable set splitting for state-constrained differential games

Automatica ◽

10.1016/j.automatica.2019.108602 ◽

2020 ◽

Vol 111 ◽

pp. 108602

Author(s):

Xuhui Feng ◽

Mario E. Villanueva ◽

Boris Houska

Keyword(s):

Differential Games ◽

Reachable Set ◽

Set Splitting

Download Full-text

An optical solution for the set splitting problem

Acta Universitatis Sapientiae Informatica ◽

10.1515/ausi-2017-0009 ◽

2017 ◽

Vol 9 (2) ◽

pp. 134-143

Author(s):

Mihai Oltean

Keyword(s):

Time Delays ◽

Destination Node ◽

Optical Device ◽

Complete Problem ◽

Splitting Problem ◽

Np Complete ◽

Set Splitting ◽

Start Node

AbstractWe describe here an optical device, based on time-delays, for solving the set splitting problem which is well-known NP-complete problem. The device has a graph-like structure and the light is traversing it from a start node to a destination node. All possible (potential) paths in the graph are generated and at the destination we will check which one satisfies completely the problem's constrains.

Download Full-text

Cyclic decomposition of sets, set-splitting digraphs and cyclic classes of risk-free games

Discrete Mathematics and Applications ◽

10.1515/dma-2017-0036 ◽

2017 ◽

Vol 27 (6) ◽

Author(s):

Alexander M. Chudnov

Keyword(s):

Cyclic Shifts ◽

Set Splitting

AbstractWe study conditions for the existence of coalition games with the result invariant under cyclic shifts of players sequence numbers. Given a total number

Download Full-text

A New Algorithm for Set Splitting Problem Based DNA Molecules Computation

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2014.3444 ◽

2014 ◽

Vol 11 (3) ◽

pp. 899-900

Author(s):

Zhaocai Wang ◽

Chengpei Tang ◽

Haifeng Liu ◽

Renlin Pei

Keyword(s):

Dna Molecules ◽

Splitting Problem ◽

Set Splitting

Download Full-text

A Linearly-Growing Conversion from the Set Splitting Problem to the Directed Hamiltonian Cycle Problem

Intelligent Systems, Control and Automation: Science and Engineering - Optimization and Control Methods in Industrial Engineering and Construction ◽

10.1007/978-94-017-8044-5_3 ◽

2014 ◽

pp. 35-52

Author(s):

Michael Haythorpe ◽

Jerzy A. Filar

Keyword(s):

Hamiltonian Cycle ◽

Splitting Problem ◽

Hamiltonian Cycle Problem ◽

Set Splitting

Download Full-text

An electromagnetism-like method for the maximum set splitting problem

Yugoslav journal of operations research ◽

10.2298/yjor110704010k ◽

2013 ◽

Vol 23 (1) ◽

pp. 31-41 ◽

Cited By ~ 3

Author(s):

Jozef Kratica

Keyword(s):

Hybrid Approach ◽

Search Procedure ◽

Steiner Triple Systems ◽

Optimal Solutions ◽

Scaling Technique ◽

Hitting Set ◽

Triple Systems ◽

Local Search Procedure ◽

Splitting Problem ◽

Set Splitting

In this paper, an electromagnetism-like approach (EM) for solving the maximum set splitting problem (MSSP) is applied. Hybrid approach consisting of the movement based on the attraction-repulsion mechanisms combined with the proposed scaling technique directs EM to promising search regions. Fast implementation of the local search procedure additionally improves the efficiency of overall EM system. The performance of the proposed EM approach is evaluated on two classes of instances from the literature: minimum hitting set and Steiner triple systems. The results show, except in one case, that EM reaches optimal solutions up to 500 elements and 50000 subsets on minimum hitting set instances. It also reaches all optimal/best-known solutions for Steiner triple systems.

Download Full-text

New Lower Bound on Max Cut of Hypergraphs with an Application to r -Set Splitting

LATIN 2012: Theoretical Informatics - Lecture Notes in Computer Science ◽

10.1007/978-3-642-29344-3_35 ◽

2012 ◽

pp. 408-419 ◽

Cited By ~ 2

Author(s):

Archontia C. Giannopoulou ◽

Sudeshna Kolay ◽

Saket Saurabh

Keyword(s):

Lower Bound ◽

Set Splitting

Download Full-text

set splitting
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results

Radiomics machine learning study with small sample size: single random training-test set split may result in unreliable results

Level set splitting in DEM for modeling breakage mechanics

Backward-forward reachable set splitting for state-constrained differential games

An optical solution for the set splitting problem

Cyclic decomposition of sets, set-splitting digraphs and cyclic classes of risk-free games

A New Algorithm for Set Splitting Problem Based DNA Molecules Computation

A Linearly-Growing Conversion from the Set Splitting Problem to the Directed Hamiltonian Cycle Problem

An electromagnetism-like method for the maximum set splitting problem

New Lower Bound on Max Cut of Hypergraphs with an Application to r -Set Splitting

Export Citation Format

set splittingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Radiomics machine learning study with a small sample size: Single random training-test set split may lead to unreliable results

Radiomics machine learning study with small sample size: single random training-test set split may result in unreliable results

Level set splitting in DEM for modeling breakage mechanics

Backward-forward reachable set splitting for state-constrained differential games

An optical solution for the set splitting problem

Cyclic decomposition of sets, set-splitting digraphs and cyclic classes of risk-free games

A New Algorithm for Set Splitting Problem Based DNA Molecules Computation

A Linearly-Growing Conversion from the Set Splitting Problem to the Directed Hamiltonian Cycle Problem

An electromagnetism-like method for the maximum set splitting problem

New Lower Bound on Max Cut of Hypergraphs with an Application to r -Set Splitting

set splitting
Recently Published Documents