Wrapper feature selection for small sample size data driven by complete error estimates

Abstract In the medical field, distinguishing genes that are relevant to a specific disease, let's say colon cancer, is crucial to finding a cure and understanding its causes and subsequent complications. Usually, medical datasets are comprised of immensely complex dimensions with considerably small sample size. Thus, for domain experts, such as biologists, the task of identifying these genes have become a very challenging one, to say the least. Feature selection is a technique that aims to select these genes, or features in machine learning field with respect to the disease. However, learning from a medical dataset to identify relevant features suffers from the curse-of-dimensionality. Due to a large number of features with a small sample size, the selection usually returns a different subset each time a new sample is introduced into the dataset. This selection instability is intrinsically related to data variance. We assume that reducing data variance improves selection stability. In this paper, we propose an ensemble approach based on the bagging technique to improve feature selection stability in medical datasets via data variance reduction. We conducted an experiment using four microarray datasets each of which suffers from high dimensionality and relatively small sample size. On each dataset, we applied five well-known feature selection algorithms to select varying number of features. The results of the selection stability and accuracy show the improvement in terms of both the stability and the accuracy with the bagging technique.

Download Full-text

Diagnosis of bladder cancers with small sample size via feature selection

Expert Systems with Applications ◽

10.1016/j.eswa.2010.09.135 ◽

2011 ◽

Vol 38 (4) ◽

pp. 4649-4654 ◽

Cited By ~ 17

Author(s):

T. Warren Liao

Keyword(s):

Feature Selection ◽

Sample Size ◽

Small Sample Size ◽

Small Sample

Download Full-text

Shrinkage-based diagonal Hotelling’s tests for high-dimensional small sample size data

Journal of Multivariate Analysis ◽

10.1016/j.jmva.2015.08.022 ◽

2016 ◽

Vol 143 ◽

pp. 127-142 ◽

Cited By ~ 7

Author(s):

Kai Dong ◽

Herbert Pang ◽

Tiejun Tong ◽

Marc G. Genton

Keyword(s):

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

High Dimensional ◽

Size Data

Download Full-text

On Time-frequency Feature Selection Method for Neural Imaging Analysis With Small Sample Size

2021 IEEE Asia Conference on Information Engineering (ACIE) ◽

10.1109/acie51979.2021.9381093 ◽

2021 ◽

Author(s):

Xiangnan He ◽

Tian Tian ◽

Wenlian Lu

Keyword(s):

Feature Selection ◽

Sample Size ◽

Small Sample Size ◽

Feature Selection Method ◽

Selection Method ◽

Small Sample ◽

Imaging Analysis ◽

Time Frequency ◽

Neural Imaging ◽

Frequency Feature

Download Full-text

Evaluating feature selection strategies for high dimensional, small sample size datasets

2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society ◽

10.1109/iembs.2011.6090214 ◽

2011 ◽

Cited By ~ 8

Author(s):

Abhishek Golugula ◽

George Lee ◽

Anant Madabhushi

Keyword(s):

Feature Selection ◽

Sample Size ◽

Small Sample Size ◽

Small Sample ◽

High Dimensional ◽

Selection Strategies

Download Full-text